Scott Drummonds on Virtualization

Designing VMs with Performance SLAs


Consolidation amplifies the uncertainty of application performance. Still, VI administrators need a means of guaranteeing performance SLAs to their applications’ users. But the best VMware has been able to offer are resource controls, which are at best an indirect mechanism for sustaining application performance. With the acquisition of B-hive, now AppSpeed, VMware moved a step closer to allowing VI administrators to guarantee a performance SLA. As an application-aware latency measurement tool, AppSpeed may eventually provide feedback to vCenter to guarantee throughput levels. But it does not today. So how are VI administrators to guarantee application performance?

It was during discussions with advanced VMware customers in Melbourne that a solution to this problem occurred to me. I have reasoned it through and I think it holds water. I have socialized it with more customers and my colleagues and we think it stands. So I want to introduce a system for implementing virtual machines with a better assurance of a performance SLA.

The key to this process is that minimum performance can be measured using limits and that performance can be assured using reservations. You can develop and document virtual machines with performance SLAs using the following procedure:

  • First, as always, define a small number of strictly-sized virtual machines to be used by all applications in your environment. Often these look something like small VMs of 1 vCPU and 4 GB RAM, medium VMs of 2 vCPUs and 8 GB of RAM, and large VMs of 4 vCPUs and 16 GB of RAM. Tune these numbers for your environment, as needed.
  • For any application, benchmark its maximum performance against each of these virtual machine configurations on an unloaded system. Chose an ISV-supplied benchmark or a well-known third party tool. This sets your high water mark for throughput for each application in its virtual machine.
  • For each configuration, set a CPU limit at 50% of the available CPU and a memory limit of 50% of the available memory. Retest the application against this smaller, limited configuration.
  • During the applications’ deployment, change the limits to reservations. That is, remove limits and set reservations equal to the limits’ previous values, in this case 50%.
  • Your application now has a maximum performance defined in bullet two, and a “guaranteed” performance measured in bullet three.  This is your application’s performance SLA.

The concept is simple: limits can be used to measure the performance of an application in the presence of that degree of contention. Reservations ensure that those resource amounts are always present. Here are some notes on this process:

  • This is not a true guarantee since network and storage throughput may drop. No tool can eliminate this risk entirely but SIOC and NetIOC can reduce the risk of a network- or storage-induced performance failure.
  • The memory test is going to be highly dependent on the working set created by your load generation tool. Your mileage will vary depending on your application owners’ use of the virtual machine.
  • vCenter will guarantee that the reservations are always available through a process called admission control, which checks the cluster to ensure that enough CPU or memory is available to run the virtual machine immediately and in the event of a server failure.

As I said above, this is not a true guarantee of application performance. But it is as close as we can get until AppSpeed or a replacement evolves into universal application latency measurement that is fed into vCenter. And this is another in a growing list of reasons why CPU and memory reservations should be part of all VMware deployments.

10 Responses

In the “Increasing contention in the face of growth” section of Scott Lowe’s “Mastering vSphere” he gives the example of creating a template VM with a reservation of 300 MHz and a limit of 350 MHz (he doesn’t use the term SLA but that’s essentially what it is). Your technique uses limits only for benchmarking, with reservations-only as the goal. I’m curious how you see the pros and cons of these two different approaches?

Also, while clearly there is a need for SLAs, wouldn’t resource pools still be a better way to go (as opposed to setting things on a VM-by-VM basis)? Your technique is applicable in either case but I’ve always been led to believe that resource pools (or even vApps) are a more flexible approach than customizing individual VMs.

    • I do not understand the value of using limits in production environments. At worst they do nothing and at best they override the scheduler’s algorithm for assigning resources based on need. But I respect Scott very much and would enjoy having him change my mind on this.

      Reservations are important to attempt to guarantee minimums. Shares are important for setting priority. I find limits valuable only for temporary tests that evaluate performance under contention.


  • Jeremy, regarding the use of Limits I have seen many people fighting a lot of performance riddles when using Limits in production. The CPU scheduler is not as efficient when Limits are applied (I think – you may need to find a better expert to confirm this). Esp. when multiple VMs fight for resources you get more power per MHz when not using Limits. From my VCP trainings I remember that the use of Limits is originally meant for troubleshooting or testing, but not for production use.

    On resource pool usage: This depends on what you sell to the end customer: single VMs or a larger unit.

  • The cons are really simple if a CPU is 3GHz and you limit it to 350MHz you will have to wait a while before you are scheduled again. Limit do exactly what you expect them to do, they will limit your VMs in terms of clock cycles. Definitely no the approach I would take.

  • @Jeremy @Frank it seems that the post is geared to squeezing out performance and gauranteeing that performance for the application. While I agree of the method for that purpose I have not seen it to be common. I actually shy away from limits, reservations, and resource pools but will use them if necessary. While its great to have these tools at hand the flip side to using them is extra complexity and management. But if it has to be done then it has to be done. No environment is the same or has the same SLA’s.

  • An interesting philosophical debate: limits are obviously “bad” from the point of view of an individual VM but potentially “good” from the point of view of other VMs.

    On a lightly-loaded system, that 350 MHz limit would clearly be a poor decision (per Duncan’s point). One might even end up throwing away 2.65 GHz. But on a heavily-loaded system, the question still remains as to whether the VMkernel scheduler does better on its own or with adminstrative guidance (read: explicit limits).

    There doesn’t seem to be any real agreement in the industry as to best practices. In the VCP4 course I took recently, a suggested best practice for general purpose VMs was to set a reservation of 5% of the physical CPU and a limit of 25% of the physical CPU. This is where I got the idea that limits (mainly implemented via resource pools) were how one definitively averted any possibility of a tragedy-of-the-commons situation.

    The practical wisdom of the folks in this thread is invaluable and appreciated, but it would also be nice to know what the VMkernel scheduling guys think. As is, VMware’s public documentation and training courses are rather ambiguous on the proper use of limits.

    • Jeremy,

      Forgive me if this comes off as bragging, but I do not know if there is anyone in the world that can answer this question better than I. I was responsible for communicating performance best practices at VMware for over three years. And while I am now just a dumb storage guy, I still have a pretty good grip on this issue.

      I can tell you that the answer to this is not documented anywhere and probably beyond the abilities of the VMware’s engineers. This is not a slight on them–they are certainly brilliant and capable–but the answer lies in vast application experimentation. And, on the whole, VMware’s application experiments are somewhat limited and focused on supporting new features.

      I had on my working list for my last 12 months at VMware a task to write a whitepaper on resource management with real application data. Because of the complexity of setting up a large, heterogenous application framework, the work never got off the ground. But perhaps we can nudge the VMware performance team to take up the mantle on this. Perhaps with VMmark?


  • […] Drummonds posting a very interesting article on the idea of Designing VMs with Performance SLAs. The article is a very good one and is one that will encourage a very good debate on the positives […]

  • In the networking world, this approach has been used with success on WAN links for a number of years – only apply reservations when the capacity of resources consumed exceeds a configured threshold. Only reserve the minimum resources for specified consumers that require a minimum – in this case VMs. Apply hysteresis – only start to apply the reservations if a burst of demand goes over the threshold for a duration of 30 (configured) seconds. Disable the reservations once demand drops below the threshold for a duration of 30 seconds.

    Again this is situational – if you are servicing stock market applications that need to accelerate from 5% of resources to 100% when the stock market opens and run at 100% for the first 30 to 60 minutes of the market being open then you might not want apply reservations in this manner…

    • I think the take away here is that there is no silver bullet or white knight configuration that is best for all situations of varying applications and hardware. But this is not much different from any technology. The default settings are the best in most cases in my opinion.