vPivot

Scott Drummonds on Virtualization

Storage IO Control

24 Comments »

Last year at VMworld 2009, Irfan Ahmad and Ajay Gulati presented a preview of an unreleased technology VMware is calling Storage IO Control (SIOC). SIOC is a feature aimed squarely at the number one cause of VMware performance problems: underperforming storage. Year after year I see misconfigured storage slowing virtualized applications with VMware blame for the problem. Now VMware hopes to add a new tool to our administrators’ toolboxes to help them identify and mitigate underperforming storage.

Storage problems are most easily identified by high device latency. When storage takes a long time to service IOs (over 20ms by my definition), application owners will soon start complaining. The goal of SIOC is to identify this trend at the VMFS volume level and take corrective action to protect high priority virtual machines. This requires two key innovations in vSphere:

  • VMFS volume latency calculation
  • Throughput reduction through device queue resizing

Latency is calculated at the VMFS volume by SIOC by weighting the average of the latencies of all virtual disks on the volume. Latency is weighted by the number of IOs (IOPS) with slight modifications to account for the expected difference in latency for different IO sizes. When the volume’s weighted average latency crosses a user-defined threshold, SIOC acts. This means reducing the virtual machines’ aggregate throughput to protect mission critical virtual machines.

The fundamental change provided by SIOC is volume-wide resource management. With vSphere 4 and earlier versions of VMware virtualization, storage resource management is performed at the server level. This means a virtual machine on its own ESX server gets full access to the device queue. The result is unfettered access to the storage bandwidth regardless of resource settings, as the following picture shows.

Queues before SIOC

A virtual machine gets full access to its device queue when SIOC is not present.

SIOC will throttle virtual machine throughput once a volume’s normalized latency crosses a threshold. The throughput is limited by decreasing each virtual machine’s access to the queue to an amount defined by its relative shares. The following figure shows this in action.

Storage queues with SIOC enabled

When storage is under contention SIOC controls throughput by assigning each virtual machine a portion of the queue defined by its relative number of shares.

The net effect of SIOC is that, under contention, virtual machines with higher shares are given prioritized access to the storage. This allows administrators to protect important virtual machines.

SIOC is going to provide a major benefit to VMware’s customers that I am sure will be appreciated by everyone. But I want to give everyone an important warning: SIOC is not a storage panacea. Poorly performing storage before SIOC remains poorly performing after SIOC has taken action. But with SIOC enabled, VMware’s administrators will know that mission critical virtual machines will be protected from the fluctuating demands on shared storage.

Disclaimer

This feature has so far only been demonstrated by VMware in our lab environments. While we hope to make this tool available in production versions of vSphere as soon as possible, we have not yet committed SIOC for any specific launch date nor to any specific version of vSphere.

24 Responses

How about alerts on slow IO perf. via vCenter? There is NO default alerting for slow IO .. so how do I even know I need to look at enabling this storage IO control?

    • It is not true that there is no alerting for slow IO in vSphere 4. You can set an alarm based on storage response time on a per-VM basis.

      I will defer an answer as to the VC alarms for SIOC. This SIOC work is a preview and specific details on the feature will be provided when VMware commits this feature to release.

      Scott

  • Please take into account vkernal actions, provisioning new VM’s from templates and svmotion are the two things that will really pound the snot out of our array causing latency to increase. Also please allow for an absolute cap not just a share based cap so we can pre-plan things rather than just letting things get bad enough that the auto-queueing mechanism kicks in.

  • Andrew,

    >> Also please allow for an absolute cap not just a share based cap so we can pre-plan things rather than just letting things get bad enough that the auto-queueing mechanism kicks in.

    Apart from setting a disk share, you can also limit the number of IOs from a VM to a virtual disk by setting an upper limit for the IOPs. SIOC will also honor this setting along with the disk shares. In this case, VM’s IOPs will be limited to the lower of the two values.

    Both shares and limits can be set in the ‘Resources’ tab of the VM properties screen. To get this screen, right clicking on a VM –> edit settings –>Resources–>Disk.

  • Will SIOC also be available for NFS volumes or is it VMFS only?

    • This is a preview of an unreleased feature. We will provide better detail on the feature during its official announcement.

      Scott

  • >>Apart from setting a disk share, you can also limit the number of IOs from a VM to a virtual disk by setting an upper limit for the IOPs. SIOC will also honor this setting along with the disk shares. In this case, VM’s IOPs will be limited to the lower of the two values.

    So if the background processes like template deployment and svmotion take this into account it should mean just setting these values on the templates or on the vm’s before they are operated on and then changing them if need be.

  • [...] Storage I/O Control (SOIC) deals with the server centric issue by introducing I/O latency monitoring at a VMFS volume level. SOIC reacts when a VMFS volume’s latency crosses a pre-defined level, at this point access to the host queue is throttled based on share value assigned to the VM.  This prevents a single VM getting an unfair share of queue resources at volume level as shown in the before and after diagrams Scott posted in his article. [...]

  • I thought shares only were used for contention? Your diagram shows VM C with 500 shares, but it’s the only VM on that ESX server. Why would it be subjected to shares since it’s not contending with other VM’s on that server?
    Are you saying that with SIOC we will have to define shares for all VM’s across all ESX hosts, regardless if they are the onky VM on that ESX server?

    • I am saying that contention should be defined at the volume (VMFS) level, not the host level. VM C’s isolation on its own host plays no part in identifying contention at the volume level. If there are 10 others VMs accessing the same volume, the LUN’s performance may be suffering. If that is the case, the throughput of all VMs that share this LUN will be throttled to improve latency.

      Scott

  • [...] Storage I/O Control (SOIC) deals with the server centric issue by introducing I/O latency monitoring at a VMFS volume level. SOIC reacts when a VMFS volume’s latency crosses a pre-defined level, at this point access to the host queue is throttled based on share value assigned to the VM.  This prevents a single VM getting an unfair share of queue resources at volume level as shown in the before and after diagrams Scott posted in his article. [...]

  • [...] to properly shape access to back-end storage resources. The inimitable Scott Drummonds discussed it on Pivot Point (his blog), and Craig Stewart also recently published an article about Storage IO [...]

  • I don’t think I have seen this depicted in such a way before. You really have clarified this for me. Thank you!

  • [...] one of the coolest version coming to a vSphere version in the near future. Scott Drummonds wrote a cool article about it which shows the strength of SIOC when it comes to fairness. One might say that there [...]

  • [...] Scott Drummonds Post at vPivot.com: Storage IO Control [...]

  • [...] the feature. The best write up on the feature was by Scott Drummonds last month that you can read here.  With SIOC you will be able to have almost a quality of service for VM’s in regards to [...]

  • [...] the feature. The best write up on the feature was by Scott Drummonds last month that you can read here.  With SIOC you will be able to have almost a quality of service for VM’s in regards to storage [...]

  • [...] Scott Drummonds has a good article about how this all works, and below is a movie which explains Storage I/O Control (SIOC) in action! [...]

  • [...] me quote Scott Drummonds article to explain the difference between previous disk share control and SIOC. The fundamental [...]

  • I just found your blog, I actually book-marked it and i am looking at the content. I witout a doubt really like it. Useful issue in any event . people look at this. I come as a result of this view which discover remarks as akin of listening.

  • [...] Articolo sul blog vPivot: “Storage IO Control” [...]

  • [...] Articolo sul blog vPivot: “Storage IO Control” [...]

  • [...] me quote Scott Drummonds article to explain the difference between previous disk share control and SIOC. The fundamental [...]

  • [...] a year and a half ago I previewed VMware’s unreleased feature, Storage IO Control (SIOC).  SIOC creates new intelligent latency metrics to evaluate the health of VMFS volumes. [...]

  • Switch to our mobile site