vPivot

Scott Drummonds on Virtualization

Virtual Storage Design: Application Consolidation

5 Comments »

Fixed recommendations for consolidation ratios are cancerous.  Whether we are talking about vCPUs per core, virtual machines per host, or VMDKs per LUN, there is no single number the represents the “right” ratio.  Accurate guidance requires workload characterization and fine tuning using vSphere’s performance counters.  Today I want to highlight one experiment that shows application choice impacting VMDK-to-LUN consolidation.  The inescapable conclusion is that sequential access data must be separated from random access files!

In 2009 VPACT paper, VMware engineers showed application performance when the storage is consolidated.  This paper is a bit academic for the average virtualization nut, but does contain insights into choosing which VMDKs to consolidate into a single VMFS volume.  It does this by running applications that contain random or sequential IO and comparing performance with isolated (dedicated) storage to performance using consolidated storage.

The first experiment tested DVD store against Microsoft SQL Server and Oracle Swingbench OLTP against an Oracle database.  These OLTP workloads result in random IO on the data disks.  In the isolation experiment each virtual machine was run with its VMDK on a three-disk RAID 5 LUN (2+1).  In the consolidation experiment, both virtual machines’ VMDKs were put on a common six-disk RAID 5 LUN (5+1).  Here are the paper’s results from table 1 of the paper:

Consolidating Random Access Virtual Disks

Consolidating Random Access Virtual Disks

The “application metric”, transactions per minute, is the most important indicator of the end user’s observed performance.  You can see from the results that consolidating (sharing) storage of random workloads does not harm performance at all.  In fact, SQL Server performance increased by 25%, reflecting the relative increase in data disks per stripe.

The second experiment again used DVD Store against SQL Server for a random IO workload.  But instead of a second random workload, an Oracle database was tested by the Swingbench Decision Support System, which results in highly sequential access.  Here are the results of that experiment, taken from table 2 of the paper:

Sequential Workload Consolidation

Sequential Workload Consolidation

The random workload, SQL plus DVD Store, again improved as the relative percentage of data disks in the RAID volume increased.  But the Decision Support System workload, so heavily dependent on sequential storage performance, suffered greatly.  DSS performance dropped 30% when measured by IO throughput and 50% when measured by completed transactions.

Applications with a sequential storage access pattern can be heavily dependent on the array’s ability to coalesce IO requests and complete large numbers of IOs very rapidly.  But when a VI admin includes a random access VMDK on the same LUN, the aggregate, interleaved LUN access is no longer sequential.  This slows down the array’s sequential and its effects are profound at the application level.

There are two summary recommendations from this experiment:

  • A smaller number of RAID 5 volumes using many disks will outperform a larger number of RAID 5 volumes that use fewer disks.  This is due to the relative decrease in parity on the configuration.
  • VMDKs with random access can be consolidated to a single VMFS volume safely but sequential access pattern files should be separated to their own LUNs.

The VMware performance team has a lot more to say about storage design to maximize application performance in virtual environments.  I will have more blog articles as the weeks progress and a white paper to share in the second quarter of 2010.  I expect it to be ready no later than EMC World 2010.

5 Responses

Hi Scott, great post, looking forward the others…

What’s the impact of (on) backups which are heavily sequential on (of) those random access VMDKs?

What are the tools available to find out what are the random and sequential access VMDKs?

In my environment, the VMDKs are so dynamic, some of them may appear random but sequential at month or quarter ends, for others it’s just the opposite. Anyway at any givent time the users needs their data without delays!

What I would need, is a storage that evolves dynamically along with my VMDKs, something behind the scene that move the VMFS to proper metavolumes… Does that exist?

Thx,
Didier

  • Didier,

    The paper details the impact of sequential and random access workloads on each other and its basic conclusion is that only sequential workloads are harmed. That means that backup behavior, initiated by the guest and resulting in sequential IO, would be slow when mixed with random workloads. The random workloads would be unaffected, assuming adequate spindle count.

    vscsiStats can be used to collect the access profile of a VMDK, but data collection and analysis can take many minutes per VMDK. So, it is difficult to do for hundreds of VMDKs. Here is information on vscsiStats:

    http://communities.vmware.com/docs/DOC-10095

    VMware and its storage partners are each working on tools that will automate the placement of VMDKs and hot blocks on faster storage. I have heard that EMC’s V-Max will do this but I will have to refer you to EMC (or the other storage vendors) for more information. It will take some time before VMware includes this type of feature in vSphere.

    Scott

  • Thanks for sharing another thought provoking post.

    “sequential access pattern files should be separated to their own LUNs”

    Is this assuming 1 lun per raid group?
    -Nakoosa

  • [...] post on application consolidation by Scott Drummonds is an old post (from January 2010), but it’s still a good one. In this [...]

  • Switch to our mobile site