Micro-bursting and Storage Performance

I have been reading Chad Sakac’s article on IO queues and micro-bursting for months now.  Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized.   Let me put my own spin on this tome, embedded in which are several jewels of wisdom.

The article describes a phenomena common to consolidated workloads called micro-bursting.  Micro-bursting occurs in such short periods as to go unnoticed in the sampling window of monitoring tools.  As Chad put it:

Remember that every metric has a timescale.   IOps is in seconds.   Disk service time is in ms (5-20ms for traditional disk, about 1ms for EFD).  If an I/O is served from cache, it’s in microseconds.   Switch latencies are in microseconds.    Here, the I/O periods were so short that they filled up the ESX LUN queues instantly, causing a “back-off” effect for the guest.   These were happily serviced by the SAN and the storage array, which had no idea anything bad was going on.

When these bursts happen queues overflow, messages backup, and service times briefly sky rocket.  These rapid overflows happen in a fraction of esxtop‘s multi-second window and vCenter‘s 20 second window.

So, what buffers are we talking about?  Take a look at Chad’s hand-drawn picture of the storage path, which is only slightly less complicated than the Republican view of Obamacare:

Chad Sakacs image showing the numerous locations of storage queues in all locations from the VM to the platter.

Chad Sakac's image showing the numerous locations of storage queues in all locations from the VM to the platter.

If you are at VI admin, you care about the LUN queue in ESX.  ESX creates one of these queues for each HBA+LUN pair.  So, multipathing to a LUN increases the effective LUN queue and using a single HBA to multiple LUNs will guarantee a queue to each LUN.  Instances of this queue will overflow if many VMs on a single server issue commands to a single LUN.  As Chad says:

In VMware land – this is usually the fact that the default LUN queue (and corresponding Disk.SchedNumReqOutstanding value) are 32 – which for most use cases is just fine, but when you have a datastore with many small VMs sitting on a single LUN, the possibility of microbursting patterns becomes more likely.

So, when will the queues overflow?  Not often:

In the example [Vaughn] used, [multi-pathing] would not help materially if there were more than 3 ESX hosts, as it would be a likely case of “underconfigured array” – not host-side queuing.

The message here is that there is only a small window of configurations will result in LUN queue overflow: many VMs on very few hosts talking to a common LUN.  This is a perfect use case for vscsiStats, which I have talked about in various forums now.  vscsiStats avoid sampling windows by recording precise information on every IO.  This means that microburst statistics will not be averaged–and lost–across a time period.

Consider the following data I pulled from a sample session on my office system:

Frequency

Histogram Bucket Limit

2

1

2

2

50

4

879

6

6588

8

82830

12

161362

16

79802

20

18080

24

5377

28

1997

32

433

64

0

64

Page 1 of 2 | Next page