Scott Drummonds on Virtualization

VKernel’s Virtualization Management Index


Back in early December the folks at VKernel posted the results of a survey of their customers’ virtualization environments.  They named this work and its summary paper the Virtualization Management Index (VMI) [registration required].  I talked with VKernel CMO Bryan Semple when the first version was released and he recently sent me an update.  Some of VKernel’s observations are pretty interesting.  And I come to a similar conclusion as VKernel based on VMware’s data in the space: the market for datacenter optimization is very big and growing by the day.

There are two graphs from the most recent VMI that are quite interesting.  The first, reproduced here, shows the degree to which VKernel’s customers are overcommitting memory and CPU:

VKernel's customers averaged very light CPU and memory commitment.

This figure is a histogram counting customers that achieve certain commitment ratios.  For memory, the bulk of customers–which I visually estimate to be more than 80%–maintain a memory ratio of under 1.0.  This represents an under-commitment of physical memory.  Nearly all customers are over-committing CPU, but with an average ratio I estimate to be at 2.5. This means VKernel’s customers are only seeing 2-3 vCPUs per CPU core.  This is light for desktops and the average workload but about right for denser databases and mail servers.

One year ago I saw a similar study produced quarterly by VMware’s Technical Account Manager (TAM) team.  The so-called TAM Dashboard summarized a wide variety of metrics from customers including host core count, ESX version, application virtualization, VM sizes, and much, much more.  While there is a public version of the TAM Dashboard, I think that my choice to pursue employment with another company deprives me of the right to publish details of that study.  However, I will share a few high level observations.

While the TAM Dashboard did not report vCPUs per core, it reported a few other metrics from which I can estimate VKernel’s measurement of this number.  And while VMware is seeing higher densities than VKernel, they are not seeing them by much.  The TAM Dashboard also surprised me by showing VMware’s customer to be averaging around eight cores per host.  This means nearly 50% of the servers in existence have less than eight.  I consider those systems to be unfit for tier-1 virtualization.

VKernel’s VMI also produced a VM density graph that shows VM count by host count.  That graph has been reproduced here.

VKernel sees fewer VMs per host on large deployments.

VKernel concludes in the VMI text that VKernel’s (and possibly VMware’s) larger customers are deriving less value from virtualization.  It is probably safe to say that larger customers are packing fewer VMs into their servers.  But there are many characteristics of very large customers that can explain this:

  • Larger customers also tend to use older systems for a longer period of time. I have known this from versions of the TAM dashboard that go back years.
  • Larger customers less frequently upgrade version of vSphere and newer versions have improved features for consolidated environments.
  • Larger customers are more conservative by nature, which explains both of the above as well as suggesting the presence of larger guardbands or “buffers” of unused space to protect their applications.

Also, VKernel’s data may suffer from self-select bias. There are likely common characteristics of their customers that are not shared by the entire industry. Results will be skewed to match that sample and not necessarily the whole pool of VMware’s customers.

In any case, I agree with VKernel’s observation that large environments are much more in need of intelligent capacity assessment and optimization than small.  In fact, I will have a lot more to say on cross-cluster datacenter optimization in an upcoming blog.

9 Responses

[…] This post was mentioned on Twitter by VMware Planet V12n, Eric Siebert. Eric Siebert said: New post: VKernel’s Virtualization Management Index (drummonds) http://bit.ly/gM2DIW […]

  • Hi Scott — This post brought to the surface again something that we’ve been discussing for a while at my university. When we attended training for vSphere 4, the instructor made a point to say we should never over-allocate CPUs on a host because of thread contention. Since then, we’ve always allocated n-1 cores to VM guests, leaving the remaining core for VMware overhead. However, I’ve seen a number of posts like this that talk about packing many more guests on each host. So I guess what I’m wondering is if it’s better to use total MHz for the host or total cores for the host as a measure of how many guests it can handle. Thanks in advance for any insights you can share.

    • Your instructor’s comment is generally correct, but I will make some minor changes. First, the reason for not over-allocating VMs is because of wasted CPU, not thread contention. Recent versions of vSphere have ways of skewing the scheduling of different vCPUs and de-scheduling idle vCPUs. Making VMs too large does not cause the contention issues it used to. However, every vCPU requires management work by the kernel. So it is wasteful to assign vCPUs to a VM if the application is not deriving value from them.

      Second, overhead on VMware’s products has dropped continuously from the earliest versions of the products. We now estimate virtualization overhead to be less than 10% on nearly all workloads. So, to suggest that an entire core should be stripped for overhead is a bit over-zealous.

      Choosing optimal consolidation ratio is more art than science these days. A variety of factors from CPU, memory, networking and storage utilization need to be considered. Not to mention licensing concerns and application latency, which can be difficult to measure in real time. Because this problem is so difficult, companies like VKernel and CiRBA are making money by helping customers solve this problem.

      But the general answer of how many VMs per host is answered through resource management: keep CPU utilization below some threshold (say, 80%), keep active memory below another threshold (70%), network and storage utilization lower than maximums, etc. As long as these resources are underutilized, add more VMs!


  • HI Scott,
    “larger customers are deriving less value from virtualization”
    I also see two other possible explainations:
    – Larger customers tend to run larger VMs
    – Those customers with more host servers in their datacenters, may be persuing more of a “scale-out” approach.

  • Hi Scott –

    One clarification. The data in the VMI is from our free tools, not our paid commercial product customers. .

    Next step for us is to go out and get some benchmark data from our customers using the commercial products. Our expectation is that we will see consolidation ratio’s significantly higher.

    More to come.

    Bryan Semple

  • […] VKernel’s Virtualization Management Index « Pivot Point Information regarding over-allocating CPU and memory in vm environments. (tags: virtualization) […]

  • We contributed to those vkernel free stats and it looks like we fall right in the middle. We are 18:1 VM:Host, 2.6:1 vCPU:pCPU and .75 allocated:available. These are new hosts (hex core Nehalem’s with 144GB’s of ram). We tried to get higher consolidation ratios but performance for our Xenapp environment suffered so we added additional cores per host (upgrading from quad’s to hex’s) and added additional hosts to provide comfortable failover capacity. Since we’re seeing over a 5x capex reduction (20x if you consider that we put off a datacenter expansion) with these “low” consolidation numbers we have little incentive to push it further.