Scott Sauer recently asked me a tough question on Twitter. My roaming best practices talk includes the phrase “do not use PVSCSI for low-IO workloads”. When Scott saw a VMware KB echoing my recommendation, he asked the obvious question: “Why?” It took me a couple of days to get a sufficient answer.
One technique for storage driver efficiency improvements is interrupt coalescing. Coalescing can be thought of as buffering: multiple events are queued for simultaneous processing. For coalescing to improve efficiency, interrupts must stream in fast enough to create large batch requests. Otherwise the timeout window will pass with no additional interrupts arriving. This means the single interrupt is handled as normal but after a useless delay.
An intelligent storage driver will therefore coalesce at high IO but not low IO. In the years we have spent optimizing ESX’s LSI Logic virtual storage adapter, we have fine-tuned the coalescing behavior to give fantastic performance on all workloads. This is done by tracking two key storage counters:
- Outstanding IOs (OIOs): Represents the virtual machine’s demand for IO.
- IOs per second (IOPS): Represents the storage system’s supply of IO.
The robust LSI Logic driver increases coalescing as OIOs and IOPS increase. No coalescing is used with few OIOs or low throughput. This produces efficient IO at large throughput and low latency IO when throughput is small.
Currently the PVSCSI driver coalesces based on OIOs only, and not throughput. This means that when the virtual machine is requesting a lot of IO but the storage is not delivering, the PVSCSI driver is coalescing interrupts. But without the storage supplying a steady stream of IOs there are no interrupts to coalesce. The result is a slightly increased latency with little or no efficiency gain for PVSCSI in low throughput environments.
LSI Logic is so efficient at low throughput levels that there is no need for a special device driver to improve efficiency. The CPU utilization difference between LSI and PVSCSI at hundreds of IOPS is insignificant. But at massive amounts of IO–where 10-50K IOPS are streaming over the virtual SCSI bus–PVSCSI can save a large number of CPU cycles. Because of that, our first implementation of PVSCSI was built on the assumption that customers would only use the technology when they had backed their virtual machines by world-class storage.
But VMware’s marketing engine (me, really) started telling everyone about PVSCSI without the right caveat (“only for massive IO systems!”) So, everyone started using it as a general solution. This meant that in one condition–slow storage (low IOPS) with a demanding virtual machine (high OIOs)–PVSCSI has been inefficiently coalescing IOs resulting in performance slightly worse than LSI Logic.
But now VMware’s customers want PVSCSI as a general solution and not just for high IO workloads. As a result we are including advanced coalescing behavior in PVSCSI for future versions of ESX. More on that when the release vehicle is set.
PVSCSI In A Nutshell
If you plodded through the above technical explanation of interrupt coalescing and PVSCSI I applaud you. If you just want a summary of what to do, here it is:
- For existing products, only use PVSCSI against VMDKs that are backed by fast (greater than 2,000 IOPS) storage.
- If you have installed PVSCSI in low IO environments, do not worry about reconfiguring to LSI Logic. The net loss of performance is very small. And clearly these low IO virtual machines are not running your performance-critical applications.
- For future products*, PVSCSI will be as efficient as LSI Logic for all environments.
(*) Specific product versions not yet announced.
Update: February 16
The simple, almost austere KB on this rare occurrence raised more questions than answers. You may notice that the KB has been updated with text from this blog since the blog’s original publication. A white paper on PVSCSI that had been under construction for quite some time was also released with a VROOM! article we often pair with such a white paper.
Hi Scott, knowing that interrupt coalescing exists also for paravirtualized network cards, does the new VMXNET3 has the same ‘problem’? I meant that the driver was designed for 10GbE infrastructure, with technique like LRO (Large Receive Offload) to achieve higher throughput, but is it also the prefered card for gigabit networks?
Thx,
Didier
Comment by deinoscloud — February 4, 2010 @ 11:42 am
VMXNET has been in use for many, many years and it highly optimized for all environments. Always use VMXNET.
Comment by Scott — February 4, 2010 @ 12:19 pm
so you recommend to NOT use VMXNET3 at all?
Comment by cw — February 23, 2010 @ 3:25 pm
[...] A technical explanation can be found at the Pivot Point blog [...]
Pingback by Best practise for selecting guest disk controller in vSphere « UP2V — February 4, 2010 @ 12:55 pm
[...] 04/02/2010 : Scott nous explique pourquoi ce pilote ne doit être utilisé QUE pour les VM ayant des I/O disque [...]
Pingback by Migration PVSCSI automatisée – MAJ - Hypervisor.fr — February 4, 2010 @ 3:27 pm
Thanks for the spotlight Scott !
Comment by hypervizor — February 4, 2010 @ 3:28 pm
[...] read on Scott Lowe’s that in the a future release, the PVSCSI adapter will be good for both low-io and high-io virtual [...]
Pingback by PVSCSI deepdive « vFrank — February 5, 2010 @ 1:39 am
Thanks again Scott, BTW do you know if they are working on supporting SCSI3? We would love to use pvSCSI for a SQL cluster but can’t as Windows 2008 requires SCSI3 and pvSCSI uses SCSI2.
Comment by Scott Sauer — February 8, 2010 @ 10:46 am
No, I don’t know. I continue to run down a longer answer on this but it seems that the array implementations of SCSI3 are no better (from a perf perspective) than SCSI2. So, we are not doing much work with it. Still poking around, though.
Comment by Scott — February 10, 2010 @ 5:01 pm
[...] article can be found here at Scott Sauer’s site Leave a [...]
Pingback by Vmware Vsphere PVSCSI white paper and article on interrupt coalescing « Raj2796's Blog — February 18, 2010 @ 3:51 am
If I am in a situation where pvscsi is “slightly worse” than the other bus types, it means that I am in an environment with only a small demand for storage performance, right? So why would I care about losing a handlful of ms for my disk access?
Comment by FP — February 25, 2010 @ 7:02 am
Exactly! That is my point in the summary. If your environment matches the case where pvscsi is slightly worse, by definition you have an application where performance is not important.
Comment by Scott — February 25, 2010 @ 11:12 am
[...] can read about it at pivot point. Filed under: I.T, VMWare Comment [...]
Pingback by Future Imperfect » Blog Archive » PVSCSI or LSI Logic: that is the question — February 26, 2010 @ 3:43 am
[...] Citrix release virtual appliances [...]
Pingback by RTFM Education » Blog Archive » vNews – Mar, 2010 — March 4, 2010 @ 11:49 am
Quite the contrary: I recommend that you always use vmxnet. Use the most recent version available to you.
Comment by Scott — February 23, 2010 @ 4:05 pm