Fixed recommendations for consolidation ratios are cancerous. Whether we are talking about vCPUs per core, virtual machines per host, or VMDKs per LUN, there is no single number the represents the “right” ratio. Accurate guidance requires workload characterization and fine tuning using vSphere’s performance counters. Today I want to highlight one experiment that shows application choice impacting VMDK-to-LUN consolidation. The inescapable conclusion is that sequential access data must be separated from random access files!
In 2009 VPACT paper, VMware engineers showed application performance when the storage is consolidated. This paper is a bit academic for the average virtualization nut, but does contain insights into choosing which VMDKs to consolidate into a single VMFS volume. It does this by running applications that contain random or sequential IO and comparing performance with isolated (dedicated) storage to performance using consolidated storage.
The first experiment tested DVD store against Microsoft SQL Server and Oracle Swingbench OLTP against an Oracle database. These OLTP workloads result in random IO on the data disks. In the isolation experiment each virtual machine was run with its VMDK on a three-disk RAID 5 LUN (2+1). In the consolidation experiment, both virtual machines’ VMDKs were put on a common six-disk RAID 5 LUN (5+1). Here are the paper’s results from table 1 of the paper:
The “application metric”, transactions per minute, is the most important indicator of the end user’s observed performance. You can see from the results that consolidating (sharing) storage of random workloads does not harm performance at all. In fact, SQL Server performance increased by 25%, reflecting the relative increase in data disks per stripe.
The second experiment again used DVD Store against SQL Server for a random IO workload. But instead of a second random workload, an Oracle database was tested by the Swingbench Decision Support System, which results in highly sequential access. Here are the results of that experiment, taken from table 2 of the paper:
The random workload, SQL plus DVD Store, again improved as the relative percentage of data disks in the RAID volume increased. But the Decision Support System workload, so heavily dependent on sequential storage performance, suffered greatly. DSS performance dropped 30% when measured by IO throughput and 50% when measured by completed transactions.
Applications with a sequential storage access pattern can be heavily dependent on the array’s ability to coalesce IO requests and complete large numbers of IOs very rapidly. But when a VI admin includes a random access VMDK on the same LUN, the aggregate, interleaved LUN access is no longer sequential. This slows down the array’s sequential and its effects are profound at the application level.
There are two summary recommendations from this experiment:
- A smaller number of RAID 5 volumes using many disks will outperform a larger number of RAID 5 volumes that use fewer disks. This is due to the relative decrease in parity on the configuration.
- VMDKs with random access can be consolidated to a single VMFS volume safely but sequential access pattern files should be separated to their own LUNs.
The VMware performance team has a lot more to say about storage design to maximize application performance in virtual environments. I will have more blog articles as the weeks progress and a white paper to share in the second quarter of 2010. I expect it to be ready no later than EMC World 2010.