I have been reading Chad Sakac’s article on IO queues and micro-bursting for months now. Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized. Let me put my own spin on this tome, embedded in which are several jewels of wisdom.
The article describes a phenomena common to consolidated workloads called micro-bursting. Micro-bursting occurs in such short periods as to go unnoticed in the sampling window of monitoring tools. As Chad put it:
Remember that every metric has a timescale. IOps is in seconds. Disk service time is in ms (5-20ms for traditional disk, about 1ms for EFD). If an I/O is served from cache, it’s in microseconds. Switch latencies are in microseconds. Here, the I/O periods were so short that they filled up the ESX LUN queues instantly, causing a “back-off” effect for the guest. These were happily serviced by the SAN and the storage array, which had no idea anything bad was going on.
When these bursts happen queues overflow, messages backup, and service times briefly sky rocket. These rapid overflows happen in a fraction of esxtop‘s multi-second window and vCenter‘s 20 second window.
So, what buffers are we talking about? Take a look at Chad’s hand-drawn picture of the storage path, which is only slightly less complicated than the Republican view of Obamacare:
If you are at VI admin, you care about the LUN queue in ESX. ESX creates one of these queues for each HBA+LUN pair. So, multipathing to a LUN increases the effective LUN queue and using a single HBA to multiple LUNs will guarantee a queue to each LUN. Instances of this queue will overflow if many VMs on a single server issue commands to a single LUN. As Chad says:
In VMware land – this is usually the fact that the default LUN queue (and corresponding Disk.SchedNumReqOutstanding value) are 32 – which for most use cases is just fine, but when you have a datastore with many small VMs sitting on a single LUN, the possibility of microbursting patterns becomes more likely.
So, when will the queues overflow? Not often:
In the example [Vaughn] used, [multi-pathing] would not help materially if there were more than 3 ESX hosts, as it would be a likely case of “underconfigured array” – not host-side queuing.
The message here is that there is only a small window of configurations will result in LUN queue overflow: many VMs on very few hosts talking to a common LUN. This is a perfect use case for vscsiStats, which I have talked about in various forums now. vscsiStats avoid sampling windows by recording precise information on every IO. This means that microburst statistics will not be averaged–and lost–across a time period.
Consider the following data I pulled from a sample session on my office system:
|Frequency||Histogram Bucket Limit|
This table shows the number of outstanding IOs as each new IO arrives in the VMkernel. The first row means that during the collection period only two IOs arrived to a queue with one outstanding IO. Row two says that two IOs entered when there was were two outstanding IOs. The third row states that 50 IOs arrived while the queue had 3-4 IOs. And so on.
This table represents a fairly healthy access pattern, showing that only 433 out of 357,402 IOs arrived while the queue had 33-64 outstanding IOs (shown on the last row). With ESX’s default LUN queue depth at 32, vscsiStats shows that a very small number of IOs arrived to an overflowing queue.
In summary, some storage performance issues appear and disappear so rapidly as to not be visible with sampling based tools, even as fine-grained as esxtop. As a VI admin you should consider this in your most challenging troubleshooting cases. And remember to use vscsiStats if all else has failed.