Scott Drummonds on Virtualization

ESX Memory Management: Ballooning Rules


[Taken from my communities blog, this article shows you why you should “Love Your Balloon Driver”.]

Earlier this month we finally published one of my favorite papers from ongoing vSphere launch activities. This paper on ESX memory management, written by Fei Guo of performance engineering, has three graphs that are absolute gems. They show balloon driver memory savings next to throughput numbers for three common benchmarks. The conclusion is inescapable: the balloon driver reclaims memory from over-provisioned VMs with virtually no impact to performance. This is true on every workload save one: Java.

Example 1: Kernel Compile

Linux kernel compilation models a common developer environment. This process is CPU- and IO-intensive but uses very little memory.

Picture 1.png

Results of two experiments are shown on this graph: in one configuration memory is reclaimed only through ballooning and in the other memory is reclaimed only through host swapping. The bars show the amount of memory reclaimed by ESX and the line shows the workload performance. The steadily falling green line reveals a predictable deterioration of performance due to host swapping. The red line holds steady, demonstrating that as the balloon driver inflates, kernel compile performance is unchanged.

Kernel compilation performance remains high with ballooning because this workload needs very little memory and the guest OS can easily take unused pages from the application. Performance falls with swapping because ESX randomly selects virtual machine pages for swapping whether those pages are in use by the application or not. The guest OS is better at selecting pages for reclamation than ESX is.

Example 2: Oracle/Swingbench

Oracle’s database is best tested against Swingbench, the OLTP load generation tool provided by Oracle. Database workloads utilize all system resources but show a non-linear dependence on memory. Memory can be safely reclaimed from OSes running databases until the cache becomes smaller than needed by the workload. The following figure shows this.

Picture 2.png

As before, the virtual machine using only ballooning maintains higher performance under memory pressure than the virtual machine whose memory is being swapped away by the host. Performance remains high as the balloon driver inflates until it encroaches into the 2G SGA. Again, ESX’s host swapping randomly selects pages to send to disk which degrades performance at all swap amounts.

As with kernel compile, the balloon driver safely reclaims memory from over-provisioned VMs with little impact to application performance.

Example 3: Java/SPECjbb

Java provides a special challenge in virtual environments due to the JVM’s introduction of a third level of memory management. The balloon driver draws memory from the virtual machine without impacting throughput because the guest OS efficiently claims pages that its processes are not using. But in the case of Java, the guest OS is unaware of how the JVM is using memory and is forced to select memory pages as arbitrarily and inefficiently as ESX’s swap routine.

Picture 3.png

Neither ESX nor the guest OS can efficiently take memory from the JVM without significantly degrading performance. Memory in Java is managed inside the JVM and efforts by the host or guest to remove pages will both degrade Java applications’ performance. In these environments it is wise to manually set the JVM’s heap size and specify memory reservations for the virtual machine in ESX to account for the JVM, OS, and heap.

Conclusions and Scott’s Special Recommendation

Love your balloon driver. Your application owners are always asking for more memory than they need. With great comfort you can over-provision memory some and rely on ESX and the balloon driver to reclaim what is not in use. Without the balloon driver, ESX will be forced to use its last technology for managing memory over-commit: host swapping. And host swapping always hurts performance.

So here is my special recommendation: never, ever disable the balloon driver. This eliminates efficient ballooning as an option to ESX.  Should any of that virtual machine’s memory be needed, ESX can only swap it out. Where ballooning usually will not hurt performance, swapping always will. If you must protect an application from memory reclamation due to memory over-commitment, use reservations. They make admission control more effective, they self-document the needs of the VM, and they are easily configured.

4 Responses

What happens is that on 90% of the cases, customer *always* oversize VMs.
Also, it is pure statistics: given a large number of VMs, not all them will need all their memory at the same time (with very, very few exceptions).

  • […] in challenging discussions on host memory swapping and its impact to performance.  If you read my article on host swapping and the whitepaper it summarized, you know the deleterious effect on performance caused by host […]

  • […] be a more severe performance impact. Here a somewhat recent post by Scott Drummonds on that topic: http://vpivot.com/2009/09/25/esx-memory-management-ballooning-rules/” Citrix, VMware, Whitepapers […]