Scott Drummonds on Virtualization



In today’s post I want to update and amplify thoughts from an old post on Storage IO Control (SIOC).  VMware customers that are using SIOC may sometimes see the following vCenter alarm:

Non-VI workload detected on the datastore

Or you may see the following warning in the vSphere client:

An external I/O activity is detected on datastore …

It is from this message that a bunch of smart questions arose from a former colleague of mine.  The bright guys at VMware–including Joey Dieckhans, a man I am very proud to have brought into VMware–provided a lot more detail about this situation.  There were so many interesting questions and answers that I will present a summary of our conversation in FAQ form.

What does SIOC do?

When VMFS volume latency passes a user-definable threshold, SIOC throttles throughput to the datastore.  Decreasing storage throughput should alleviate congestion on the datastore, hopefully resulting in a decreased latency.

What does this alarm mean?

If SIOC reduces throughput but latency does not decrease, SIOC assumes there is another workload driving storage IO to the volume and impacting latency.  This alarm informs the administrator that latency is not decreasing as virtual machine throughput is reduced.

How does behavior change once this alarm has been raised?

This alarm is effectively a sign that SIOC has given up and is no longer trying to throttle virtual machine access to the storage device.  This was deemed the proper behavior to guarantee that the virtual machines do not starve as non-virtualized workloads continue to access the shared datastore.

Has this alarm’s behavior changed in vSphere 5?

In Joey’s words, in vSphere 5 the SIOC alarm is no longer on a “hair trigger”.  It will make sure that there are several observed anomalies before raising the alarm.  This should reduce the alarm’s frequency, which was usually the result of a false positive.  Also, the alarm is no longer enabled by default in vSphere 5.  If you want to see it–and I recommend that you all do–you should enable the “Unmanaged workload detected on SIOC-enabled Datastore” alarm.

What type of configurations would raise this alarm?

In any environment where VMware is sharing storage with non-VMware workloads. Examples include two partitions on a LUN, or two LUNs in a RAID group, or two LUNs in a storage pool, or even two LUNs on different arrays but sharing common interconnect. The degree to which competing workloads will impact each other on a single array is highly specific to the workload and storage architecture. But since the alarm’s sensitivity was decreased in vSphere 5 the later possibilities of the above list are much less likely.

What does EMC recommend with respect to this alarm?

I do not want to speak for all storage vendors, but I think there should be no vendor-specific recommendation for this alarm.  Administrators should enable SIOC and they should be informed when SIOC is unable to provide its advertised value.  Administrators need not act on the alarm but they should be aware that SIOC has stopped trying. If SIOC functionality is required and this alarm is frequently raised, consider separating physical workloads to different pools or even arrays.

Where does SIOC run?

SIOC runs in the VMkernel using values configured by vCenter.  As a result, if vCenter goes offline SIOC will continue to function normally.

What exactly is done to throttle virtual machines?

Using the parameters passed to it by vCenter, SIOC can calculate the relative priority of each of the virtual machines on that ESX host.  SIOC can then parcel out queue slots based on relative priority of each virtual machine that shares the VMFS volume.

Is there a cool video that would put this feature to demonstration with clever graphics and a catchy song?

Why, yes!  See the video Joey created around SIOC’s release.

I will update this entry as questions arise.

3 Responses

This is very annoying alarm since majority of storage arrays today are “pooled” so it is not that unexpected that same disks would be utilized by non-VI workloads. This notification is simply just cluttering vSphere event logs and increasing size of vCenter database unnecessarily.

    • Well, if you are pooling your storage between VMWare and other workloads then SIOC isn’t doing you any good and that is what the alarm is letting you know. Basically VMWare setup SIOC so that it wouldn’t try to completely strangle your VM’s just so that a workload with an unknown priority can use up the freed capacity (where capacity is bandwidth or IOPS).

  • If you’re using pooled storage and you get this alarm, you aren’t providing enough headroom for the IOPS required from the virtual infrastructure.

    More spindles!