vPivot

Scott Drummonds on Virtualization

Maximum Hosts Per Cluster

23 Comments »

I just returned from a one week vacation to a warm sunny beach on a small island not too far from Singapore.  Even on my vacations my conversations often migrate to technology and my travel mate is an old friend and current employee at VMware, Dave Korsunsky.  Sitting by a pool with a cocktail in hand at a fantastic hotel I asked my friend, “what is the right number of hosts per DRS/HA cluster?”  Great conversation for a vacation, right?

I started thinking about this topic at Sydney’s vForum a month ago.  VMware’s Dan Anderson suggested that designs that implemented maximum cluster sizes (32 hosts per cluster) were the result of misguided reasoning.  Dan insisted that clusters need never be larger than eight hosts per cluster.  And on this subject we bantered for a few minutes.  Dan convinced me that there are few compelling reasons to implement large clusters.  And we could think of many reasons to avoid them.  I do not think it easy to assign one number as the “right” cluster size.  But there are many principles that suggest small to medium sized clusters being choices.

First, the argument for the largest clusters: DRS efficiency.  This was my primary claim in favor of 32-host clusters.  My reasoning is simple: with more hosts in the cluster there are more CPU and memory resource holes into which DRS can place running virtual machines to optimize the cluster’s performance.  The more hosts, the more options to the scheduler.

But on retrospect I think this is a weak argument.  Its not backed by data and in practice I cannot imagine a 16 host cluster being much more efficient than an eight host cluster.  Once vCenter is managing hundreds or more virtual machines per cluster, it has an astronomical number of combinations for VM placement.  So, doubling the host (and the virtual machine count) should have little impact to cluster efficiency.

More importantly, with respect to the efficiency argument, maximum CPU and memory utilization will be bound either by the failover capacity or the target utilization, which is usually about 80%.  With 20% reserved for resource spikes, the failover capacity is equal to the reserved resources at a 4+1 HA cluster.  Any any cluster larger than this, the failover capacity is less than 20%.  This means that only target utilization bounds resource efficiency.

The efficiency calculation is a little more tricky if you want to size your cluster for target resource utilization after a host failure.  In this case each additional host provides some incremental value to the cluster’s utilization.  To size a 4+1 cluster to 80% utilization after host failure, you will want to restrict CPU usage in the five hosts to 64%.  Going to a 5+1 cluster results in a pre-failure CPU utilization target of 66%.  The increases slowly approach 80% as the clusters get larger and larger.  But, you can see that the incremental resource utilization improvement is never more than 2%.  So, growing a cluster slightly provides very little value in terms of resource utilization.

Now why might you want to keep a cluster small?  I can think of a few reasons.

It is generally wise to avoid mixing different classes of servers in a single pool.  DRS does not make scheduling decisions based on the performance characteristics of the server so a new, powerful server in a cluster is just as likely to receive a mission-critical virtual machine as older, slower host.  This would be unfortunate if a cluster contained servers with radically different–although EVC compatible–CPUs like the Intel Xeon 5400 and Xeon 5500 series.  In the former case ESX would be using its software memory management unit which could perform as much as 40% worse than the hardware MMU in the Xeon 5500.

(I will momentarily digress to answer a question I often get in my performance talks: what is the impact of Enhanced vMotion Compatibility (EVC) on virtual machine performance?  Briefly: very little to none.  The instructions that are disabled on newer processors only benefit applications that were compiled to use those new instructions.  Those applications are rare in the enterprise space.)

Given my recommendation that servers in a cluster should be of a similar class of performance, you will soon find that your purchasing patterns will influence your cluster size.  If you are one of the few people lucky enough to work at a company that is buying servers by the truckload, you can size your clusters however you want.  But the vast majority of VMware’s customers make smaller purchases of anywhere from four to 16 servers at a time.  These will make nice, homogenous clusters of moderate size.

One more argument Dave offered for keeping clusters small is to use clusters for logical separation of applications of different class.  By putting your mission-critical applications in a cluster of their own your “server huggers” will sleep better at night.  They will be able to keep one eye on the iron that can make or break their job.  In my opinion, using physical separation in a virtual world is resisting the complete cloud and hardware independent virtualization that we are all striving for.  But I cannot begrudge an administrator that wants to hold onto some semblance of physical hardware best practices while traveling the multi-year journey to the private cloud.

Another of Dan’s arguments against large customers is the cumbersome nature of their change control.  Clusters have to be managed to a consistent state and the complexity of this process is dependent on the number of items being managed.  A very large cluster will present unique challenges when managing change.

So, have I given a recommendation?  I am not sure.  If anything I feel that Dave, Dan and I believe that a minimum cluster size needs should be set to guarantee that the CPU utilization target, and not the HA failover capacity, is the defining the number of wasted resources.  This means a minimum cluster of something like four or five hosts.  While neither of us claims a specific problem that will occur with very large clusters, we cannot imagine the value of a 32-host cluster.  So, we think the right cluster size is somewhere shy of 10.

I am quite interested to hear your thoughts on this.  Perhaps the best guidance will grow out of the crucible of debate.

23 Responses

Scott…

Great post and great discussion material, sounds like you had a great vacation! I have had this same discussion with other colleagues and myself 🙂 many times. I agree that a “32 host” cluster is not really a prime number, or needed. I think anything over 12 causes management overhead to decrease the benefits of an HA/DRS cluster. In my deployments, again as you stated, depending on CPU requirements, I do not go larger then 8 hosts in a cluster. For smaller SMB deployments I keep the cluster size around 3-5 hosts depending on business requirements. I have come across a lot of businesses that require strict application “separation” across clusters, for reasons above my pay grade. But these points confirm one of the basic key points of this argument, what is your business requirement?

I also want to re-iterate your point on the CPU mix-matching. I for one, am completely against mixing CPU generations across clusters UNLESS absolutely needed. I don’t like mixing Xeon 5400 class processors w/ Xeon 5500. This is just too much of a “jump” in hardware changes. I have deployed many Xeon 5500 and Xeon 5600 clusters in SMB’s with no issues, and since Nehalem > Westmere architecture changes are not as substantial I don’t have an issue with this deployment configuration.

Couple of my experiences and opinions on the topic…a great one at that!!

Jonathan

    • With respect to mixing e.g. 5400 and 5500 CPUs, I completely agree. They are too different from one another, and mixing them in the same cluster can result in unpredictable performance behavior of a bunch of applications.

  • My answer has to do with the number of connections to storage. This, to me is the deciding factor. In an Active/Passive environment, the number of paths to storage will be, in a robust environment, doubled, and therefore with today’s versions of ESX, large numbers of LUNS, you’ll have issues. Storage companies are taking these issues into account, as storage is becoming further virtualized and luns are becoming thinly provisioned. But, my concerns are less along the lines of the numbers of hosts per HA/DRS cluster.

    As ESX matures, the Lun limitation will become less an issue, I’ve been told, but as it stands, this is a critical issue in an active/passive environment.

    I’ve never made clusters of the 32 Host size for any reason, but once, and that was an experiment that did not work particularly well. It was, as stated above, a management and strategic difficulty. At these sizes, however, the only issue differentiating the vm’s really is IP schema, potentially or maybe profile type as a key distinguishing factor. In those cases, I’d be happy to isolate them anyway. So, dividing the clusters is no limiting factor anyway.

    My real world experience has been that a 15 node cluster is my larger sized production cluster, and 3 or 4 hosts in a smaller cluster is about right.

    • The 255 LUNs limitation is one side, the more severe one is SCSI reservations (I am assuming FC or iSCSI here). Big clusters mean lots of VMs. To achieve the full HA redundancy each host will see every LUN. Whenever SCSI reservations happen, they will affect the whole cluster.

      In our experience I/O trashing becomes a real threat in a 4Gig FC environment if a cluster exceeds 8 hosts connected to an active/passive array. In such a scenario one heavy-I/O VM is enough to get into real trouble.

      8 hosts per cluster is definitely the limit.

      Regards,
      Thorsten

  • […] Maximum Hosts Per Cluster « Pivot Point In my opinion, using physical separation in a virtual world is resisting the complete cloud and hardware independent virtualization that we are all striving for. But I cannot begrudge an administrator that wants to hold onto some semblance of . If anything I feel that Dave, Dan and I believe that a minimum cluster size needs should be set to guarantee that the CPU utilization target and not the HA failover capacity, is the defining the number of wasted resources. […]

  • Scott,

    I think you have some good points. I would argue that your cluster size should be aligned on application and host boundaries. In other words, dont put unlike hosts in a cluster. Also, if we really need application boundaries, then perhaps your cluster needs to be limited (although that argument is a bit weak to me – regulations like FDA and HIPAA might be good reasons).

    Beyond that, I have say that I see no downside to a large 32 node cluster, esp. when the applications dont need to be isolated for whatever reason and the hardware is the same. I may have been in a unique boat because I was one of those places buying servers 100+ at a time and running a homogenous application set.

  • Scott,

    I think your maximum cluster size has another dependency, which is the number of blades in your enclosure and the number of enclosures(If you are using blades that is). I always want to be able to empty out an blade enclosure of VM’s. However, now that I have more load on my clusters, as soon as a cluster has more blades in an enclosure than the N-X ratio for HA, I cannot do this.

    For example If I would have 8 enclosures with 16 blades, I would have 16 clusters of 9 hosts, with a N-1 HA policy. I realize not many customers have this kind of hardware, but still I think it is something to take into account.

    Regards,

    Ad

  • 9 hosts is obviously 8 hosts typo

  • […] Scott Drummonds (former VMware Performance Guru). Scott posted a question on his blog about what the size of a cluster should be. Scott discussed this with Dave Korsunksy and Dan Anderson, both VMware employee, and more or less […]

  • @Thorsten: How about adopting VAAI and SIOC?

    • Duncan,

      you are right. VAAI and SIOC are steps in the right direction. There are drawbacks though.

      SIOC sounds good, but it is Enterprise Plus only. Since our biggest CPUs are Hexacores and our VLAN Management is pretty well scripted via vMA, we had no need to upgrade from Enterprise. SIOC is the first feature, we could really benefit from. This is no technical issue, but even as a SysAdmin I cannot ignore the monetary ones :).

      VAAI looks really promising, I just need the array supporting it to give it a try.

      I completely agree with your arguments, especially where you say “Yes there are exceptions” :). Most infrastructures have grown over time. You do not through out all your 4G FC, even if 16G is around the corner. I do not know many virtualization admins, that have the opportunity to design from scratch.

      I guess we can agree, that cluster size is a design decision, one should not make by following VMware’s marketing slides. It will probably become less important in the near future.

  • Some companies like Oracle still require licensing for all the hosts/CPUs/cores in a VMware cluster irregardless of the number of instances running. A small (dedicated) cluster running only VMs for these applications may be more desirable to avoid unneccasary cost.

  • One other area I have seen this come up has to do with certain ISV licensing policies. There are companies out there (Oracle among others) who will make you license every CPU in the cluster even if you only have a small number of VMs running that software. So for example if you have 32 CPUs in your cluster, but only 10 vCPUs actually allocated to virtual machines running the ISV’s software, you will still need 32 licenses. Creating smaller clusters can sometimes be used as a workaround in theses situations.

    • I would argue that mandatory sub-lun policies should count as a hard partition but who knows if Oracle would agree since they don’t see 8 cores as a hard boundary and make you license all the cores that the hosts contain.

  • I don’t agree with some of the reasoning in this post. My detailed response is posted on yellow-bricks:
    http://www.yellow-bricks.com/2010/11/29/re-maximum-hosts-per-cluster-scott-drummonds/comment-page-1/#comment-13818

    Some additional comments:
    Re: the ISV licensing issue – have you guys who mention this considered DRS VM-host affinity rules in vSphere 4.1 to create a shared sub-cluster inside a cluster?

    Ulana (DRS product manager)

    • Hi Ulana. The shared sub-cluster with DRS host-affinity rules in vSphere 4.1 may address some ISV licensing policies but is unfortunately still not honored by Oracle….

  • […] virtualization performance guru Scott Drummonds posted a collection of thoughts regarding maximum hosts per cluster. Duncan Epping—who along with Frank Denneman recently published their HA and DRS Technical […]

  • […] other factors such as LUN paths per host (only 255 LUNs per host, 64 NFS datastores). See this great post at Scott Drummonds Pivot Point blog and Duncan Epping’s followup […]

  • Just curious if your thoughts/findings have changed w/ vsphere 5’s overhauled HA?

    • Its funny that you mention that. HA design played no role in my previous recommendations. But I now realize that it should have been a minor reason for keeping cluster sizes small. With the primary node system removed in the new HA, that reason would be removed. But, since I did not use that reason in my previous recommendation I think my arguments are unchanged.

  • […] Drummond’s Maximum Hosts Per Cluster is a must […]