<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pivot Point &#187; vcenter</title>
	<atom:link href="http://vpivot.com/tag/vcenter/feed/" rel="self" type="application/rss+xml" />
	<link>http://vpivot.com</link>
	<description>Scott Drummonds on Virtualization</description>
	<lastBuildDate>Wed, 08 Sep 2010 08:37:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Performance Troubleshooting Made Simple</title>
		<link>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/</link>
		<comments>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/#comments</comments>
		<pubDate>Mon, 10 May 2010 13:27:05 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[troubleshooting]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=525</guid>
		<description><![CDATA[I have struggled for years to give VMware&#8217;s customers a framework for diagnosing performance problems.  People want a simple system to troubleshoot the unknown sources of poorly performing applications.  The best attempt at documenting such a flow is Hal Rosenberg&#8217;s document on vSphere performance troubleshooting. Elegant as it may be, Hal&#8217;s document remains [...]]]></description>
			<content:encoded><![CDATA[<p>I have struggled for years to give VMware&#8217;s customers a framework for diagnosing performance problems.  People want a simple system to troubleshoot the unknown sources of poorly performing applications.  The best attempt at documenting such a flow is <a href="http://communities.vmware.com/docs/DOC-10352">Hal Rosenberg&#8217;s document on vSphere performance troubleshooting</a>. Elegant as it may be, Hal&#8217;s document remains complex for the novice VI administrator.  And it is because that document is so complex that performance people maintain their job security.  <img src='http://vpivot.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   But in an effort to further obviate my own job, I will try and generalize the troubleshooting flow to add more clarity to the process.</p>
<p><span id="more-525"></span>The first tool in the VI administrator&#8217;s toolbox should always be vCenter.  Through the vSphere client you can use vCenter&#8217;s performance counters to confirm a problem with any of the four resources (storage, CPU, memory, network).  vCenter&#8217;s 20 second sample window impedes its ability to eliminate a resource as a problem.  This is because a three second spike in any resource will be smoothed and missed over the 20 second window.  But when vCenter confirms a sustained resource bottleneck, it is sure to be the performance problem&#8217;s cause.</p>
<p>If vCenter fails to confirm an obvious performance problem, the administrator must next go to more precise, more time-intensive, and more knowledge-intensive tools such as esxtop and vscsiStats.  esxtop takes more skill and time than vCenter but provides better resolution and more visibility into the system.  vscsiStats is the most time-intensive tool and has limits with ESXi hosts but can uncover a world of detail invisible to esxtop and vCenter.</p>
<p>I estimate each tool&#8217;s chance of identifying a random performance problem as follows:</p>
<ul>
<li>vCenter: used in 90% of performance problems</li>
<li>esxtop: used in 9% of problems</li>
<li>vscsiStats: used 0.9% of the time</li>
</ul>
<p>The remaining 0.1% of the time is when you engage your account team or your local VMware performance expert.</p>
<p>Even within each tool&#8217;s usage there is an hierarchy of investigation: storage, CPU, memory and network.  My experience with troubleshooting has informed this decision.  Storage causes the most problems, then CPU, then memory, and lastly (and rarely) network. After each resource level is inspected in vCenter, a repeat of the inspection should occur on esxtop.  Guest tools may be a third option for memory, CPU, and network but vscsiStats should always be consulted if the performance problem persists.</p>
<p>VMware&#8217;s growing array of performance management tools will change this flow somewhat.  AppSpeed, for instance, adds the ability to make very educated guesses about resource bottlenecks based on inside information into the application execution.  Hyperic can provide in-guest process visibility and Ionix ADM will map application interdependenies to focus the investigation.  But, I will abstain from providing best practices on these tools until I have used them more.  In all cases, however, the fundamental relationship of &#8220;easy first, precise later&#8221; remains.</p>
<p>VMware continues to work towards integrating all of these tools into a single view within the vSphere client.  I expect that integration will improve the success rate of the performance layman in troubleshooting these problems.  But I am sure that even into the distant future performance people will find their jobs secure.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How Many Virtual CPUs Per VM?</title>
		<link>http://vpivot.com/2010/04/30/how-many-virtual-cpus-per-vm/</link>
		<comments>http://vpivot.com/2010/04/30/how-many-virtual-cpus-per-vm/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 04:22:42 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cpu]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[vcenter]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=403</guid>
		<description><![CDATA[Virtual machine sizing is a tricky issue for many VMware administrators.  It is important to find the right number of virtual CPUs to maximize application performance and minimize wasted CPU cycles.  The optimal number of vCPUs can never be easily identified.  But I can offer a few suggestions to help get this [...]]]></description>
			<content:encoded><![CDATA[<p>Virtual machine sizing is a tricky issue for many VMware administrators.  It is important to find the right number of virtual CPUs to maximize application performance and minimize wasted CPU cycles.  The optimal number of vCPUs can never be easily identified.  But I can offer a few suggestions to help get this number right.</p>
<p><span id="more-403"></span><br />
ESX must expend CPU cycles to maintain running virtual CPUs whether they are being used by an application or not.  This means that host efficiency drops as more vCPUs are put on the server.  But applications that scale well with CPUs will deliver greater performance when their virtual machines have been given more CPUs.  The administrator must therefore balance the desires of an individual application&#8217;s owner with the needs of the entire cluster&#8217;s of applications.</p>
<p>There are several resources that VI administrators can use to inform their decisions in virtual machine sizing.  I have listed some of them below.</p>
<h2>Bruce Herndon&#8217;s Cost-of-SMP Article</h2>
<p>Last summer the VMmark team&#8217;s Bruce Herndon published <a href="http://blogs.vmware.com/performance/2009/06/measuring-the-cost-of-smp-with-mixed-workloads.html">an article on the cost of SMP</a>.  I summarized his findings in <a href="http://vpivot.com/2009/09/29/four-things-you-should-know-about-esx-4s-scheduler/">a vPivot article I wrote on the ESX 4 scheduler</a>.  There are two key messages that you can take away from these posts to inform your decisions on virtual machine sizing:</p>
<ul>
<li>Over-sized virtual machines only hurt system performance when the server&#8217;s CPUs are saturated.  When utilization is low, unneeded vCPUs only penalize the system&#8217;s CPU utilization, not the applications&#8217; performance.</li>
<li>Unneeded 2-way virtual machines are not very harmful to the environment.  But administrators should be very careful with 4-way virtual machines and larger.</li>
</ul>
<h2>Co-stop and Ready Time</h2>
<p>Ready time indicates a vCPU waiting for an available core when it has work to perform.  Co-scheduling stop time (or co-stop time) indicates a vCPU being paused by the scheduler to allow its sibling vCPUs to catch up.  These two counters can help administrators recognize a certain kind of stress due to limited CPU resources.</p>
<p>Ready time is generally a sign of the unavailability of CPU.  Correction usually requires the administrator reducing work on the host (migrating virtual machines, decreasing vCPU count, etc.) or increasing CPU capacity (more hosts or faster CPUs).  Co-stop time is a sign that the scheduler is allowing vCPUs to develop skew while it runs portions of virtual machines on available cores.  Considerable numbers for these counters are 10% ready time and 3% co-stop time.  There is no guarantee that application performance is suffering if these thresholds are crossed, but a problem may be present.</p>
<p>The important thing about ready time and co-stop time is that they are signs that you are using all of the CPU you have available to you.  This could be a Good Thing.  But it could also be a surprise to you.  When these counters get high it is a good time to start asking yourself if you capacity usage meets your expectations.  If not, you should inspect your virtual machines to be sure that the applications are using the vCPUs you have given them.  If your guest tools show poor in-guest utilization then decrease those VM sizes.  That will free up resources in the cluster for more virtual machines.</p>
<h2>Application Scalability Information</h2>
<p>I wish we lived in a world where every ISV published data showing their applications&#8217; abilities to scale with cores.  Unfortunately for us, many software vendors have for years allowed their customers to assume that each doubling of cores would double the performance of the application.  VMware has chosen to provide some scalability information so our customers know <a href="http://www.vmware.com/pdf/Perf_ESX40_Oracle-eval.pdf">how well</a> or <a href="http://www.vmware.com/files/pdf/consolidating_webapps_vi3_wp.pdf">how poorly</a> applications scale.  But every customer of a software company deserves to have the vendor provide guidance on sizing the server.  And those vendors deserve the right to put these results out on their own products.  Go talk to your ISV to get the information you need to size your virtual machines.</p>
<h2>CPU Usage Calculations and CapacityIQ</h2>
<p>I am belatedly updating this post with a fourth way of identifying oversized virtual machines: mathematical calculation or Capacity IQ.</p>
<p>When a virtual machine consistently uses only a fraction of its vCPU resources it is possible that the virtual machine can be downsized and still deliver the same application performance.  The calculation to determine this is simple: multiply the vCPU count by utilization and round up.  Set the virtual machine&#8217;s vCPU count to the result of that calculation.</p>
<p>If you own CapacityIQ it will make this calculation for you for every virtual machine in your data center.  Here is an screenshot of its recommendations based on virtual machine CPU and memory utilization.  Click for a clearer picture.</p>
<div id="attachment_512" class="wp-caption alignnone" style="width: 310px"><a href="http://vpivot.com/wp-content/uploads/2010/04/capiq_vm_size_recs.png"><img src="http://vpivot.com/wp-content/uploads/2010/04/capiq_vm_size_recs-300x102.png" alt="" title="Capacity IQ Recommending VM Resize" width="300" class="size-medium wp-image-512" /></a><p class="wp-caption-text">CapacityIQ monitors CPU and memory utilization to recommend VM downsizing.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/04/30/how-many-virtual-cpus-per-vm/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Optimizing Memory Utilization</title>
		<link>http://vpivot.com/2010/01/06/optimizing-memory-utilization/</link>
		<comments>http://vpivot.com/2010/01/06/optimizing-memory-utilization/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 21:52:45 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[ssd]]></category>
		<category><![CDATA[swap]]></category>
		<category><![CDATA[vcenter]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=198</guid>
		<description><![CDATA[My recent series of blog articles have discussed ESX memory management the the performance specter of host swapping.  My last article attempts to correct the misconception that VMware recommends against over-commit memory.   In that article I suggested that memory over-commit is requirement in optimizing memory utilization. Today I want to provide a specific [...]]]></description>
			<content:encoded><![CDATA[<p>My recent series of blog articles have discussed ESX memory management the the performance specter of host swapping.  My last article attempts to <a href="http://vpivot.com/2010/01/04/misunderstanding-memory-management/">correct the misconception that VMware recommends against over-commit memory</a>.   In that article I suggested that memory over-commit is requirement in optimizing memory utilization. Today I want to provide a specific example to show why this is true.   I am have also included tips for identifying host swapping in your environments.<br />
<span id="more-198"></span></p>
<h2>Understanding the Bottleneck</h2>
<p>Let me show the value of over-commit and danger of swapping by way of an example.  I will choose the following typical values to demonstrate my point:</p>
<ul>
<li>All virtual machines are on a single host which has <strong>32 GB of RAM</strong> installed.</li>
<li>Each virtual machine is sized to <strong>8 GB of RAM</strong>.</li>
<li>Each virtual machine has <strong>25% active memory</strong> (%ACTV in esxtop and &#8220;Active&#8221; in vCenter).</li>
</ul>
<table id="newspaper-a">
<tbody>
<tr>
<th>VM Count</th>
<th>Active Memory in Host</th>
<th>Comments</th>
</tr>
<tr>
<td>3</td>
<td>3 * 8 GB * 25% = <strong>6 GB</strong></td>
<td>Without memory over-commit, <em>only 18% of the host&#8217;s memory is actively in use</em>.   What a waste!</td>
</tr>
<tr>
<td>12</td>
<td>12 * 8 GB * 25% = <strong>24 GB</strong></td>
<td>Memory is over-committed by 200% but only 75% is actively being used.  In this aggressive consolidation <em>virtual machines will run at full speed</em> until usage exceeds 100% of host memory.</td>
</tr>
<tr>
<td>18</td>
<td>18 * 8 GB * 25% = <strong>36 GB</strong>, limited to <strong>32 GB</strong> by host</td>
<td>These virtual machines want 36 GB of RAM but are limited to the 32 GB that is installed on the host.  ESX must swap to allow these machines to run and <em>performance will suffer greatly</em>.</td>
</tr>
</tbody>
</table>
<p>A virtual machine&#8217;s active memory is dictated by the application and its usage.  But the VI admin has complete control over the number of virtual machines in the environment which means host active memory can be influenced by adding or removing virtual machines.  Because virtual machine active memory is always equal to or less than 100% the only way to drive the host active memory to 100% is to over-commit memory.   <em>This is why hypervisors that do not support memory over-commit are simply not viable for data centers where memory optimization is a priority.</em></p>
<h2>Identifying and Correcting the Bottleneck</h2>
<p>The ongoing occurrence of swapping is identified by a non-zero swap rate in either esxtop or vCenter.  In addition to swap rate, esxtop provides a swap wait time in its CPU panel.  When swap rate exceeds hundreds of kilobytes per second or swap wait time exceeds a couple percentage points, it is time for corrective action.</p>
<p>There are three possible solutions to this problem:</p>
<ol>
<li>Balance the virtual machines&#8217; memory usage by moving virtual machines from hosts with higher amounts of memory usage to hosts with lower amount of memory usage.</li>
<li>Run fewer virtual machines.</li>
<li>Buy more memory.</li>
</ol>
<h2>Designing Your Infrastructure to Simplify Memory Management</h2>
<p>Ultimately I owe you a full white paper on memory management to provide a sufficient answer.  But I want to give you two ideas of the tools and techniques that I will be describing when in this future paper.  First, place <a href="http://vpivot.com/2009/12/24/solid-state-disks-and-host-swapping/">host swap files on solid state disk (SSD) stores</a> to improve their performance.  With the right SSD device it may be possible to eliminate swap penalties.  Second, even if SSDs are unavailable consider consolidating multiple swap files onto a single store.  This will make swap rate monitoring very easy but may compound the performance penalties of swapping.</p>
<p>Stay tuned and VMware will provide more documentation on memory management in 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/01/06/optimizing-memory-utilization/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Micro-bursting and Storage Performance</title>
		<link>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/</link>
		<comments>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 22:49:55 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vmkernel]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=47</guid>
		<description><![CDATA[I have been reading Chad Sakac&#8217;s article on IO queues and micro-bursting for months now.  Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized.   Let me put my own spin on this tome, embedded in which are several jewels of [...]]]></description>
			<content:encoded><![CDATA[<p>I have been reading <a href="http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html">Chad Sakac&#8217;s article on IO queues and micro-bursting</a> for months now.  Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized.   Let me put my own spin on this tome, embedded in which are several jewels of wisdom.</p>
<p><span id="more-47"></span>The article describes a phenomena common to consolidated workloads called micro-bursting.  Micro-bursting occurs in such short periods as to go unnoticed in the sampling window of monitoring tools.  As Chad put it:</p>
<blockquote><p>Remember that every metric has a timescale.   IOps is in seconds.   Disk service time is in ms (5-20ms for traditional disk, about 1ms for EFD).  If an I/O is served from cache, it’s in microseconds.   Switch latencies are in microseconds.    Here, the I/O periods were so short that they filled up the ESX LUN queues instantly, causing a “back-off” effect for the guest.   These were happily serviced by the SAN and the storage array, which had no idea anything bad was going on.</p></blockquote>
<p>When these bursts happen queues overflow, messages backup, and service times briefly sky rocket.  These rapid overflows happen in a fraction of <a href="http://communities.vmware.com/docs/DOC-9279">esxtop</a>&#8217;s multi-second window and <a href="http://communities.vmware.com/docs/DOC-5600">vCenter</a>&#8217;s 20 second window.</p>
<p>So, what buffers are we talking about?  Take a look at Chad&#8217;s hand-drawn picture of the storage path, which is only slightly less complicated than <a href="http://www.advanceusa.org/blog/content/binary/Obamacare%20Diagram.jpg">the Republican view of Obamacare</a>:</p>
<div class="wp-caption alignnone" style="width: 650px"><a href="http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html"><img title="Queues in the Storage Path" src="http://virtualgeek.typepad.com/.a/6a00e552e53bd2883301157135b4ae970b-pi" alt="Chad Sakacs image showing the numerous locations of storage queues in all locations from the VM to the platter." width="640" height="480" /></a><p class="wp-caption-text">Chad Sakac&#39;s image showing the numerous locations of storage queues in all locations from the VM to the platter.</p></div>
<p>If you are at VI admin, you care about the LUN queue in ESX.  ESX creates one of these queues for each HBA+LUN pair.  So, multipathing to a LUN increases the effective LUN queue and using a single HBA to multiple LUNs will guarantee a queue to each LUN.  Instances of this queue will overflow if many VMs on a single server issue commands to a single LUN.  As Chad says:</p>
<blockquote><p>In VMware land – this is usually the fact that the default LUN queue (and corresponding Disk.SchedNumReqOutstanding value) are 32 – which for most use cases is just fine, but when you have a datastore with many small VMs sitting on a single LUN, the possibility of microbursting patterns becomes more likely.</p></blockquote>
<p>So, when will the queues overflow?  Not often:</p>
<blockquote><p>In the example [Vaughn] used, [multi-pathing] would not help materially if there were more than 3 ESX hosts, as it would be a likely case of “underconfigured array” – not host-side queuing.</p></blockquote>
<p>The message here is that there is only a small window of configurations will result in LUN queue overflow: many VMs on very few hosts talking to a common LUN.  This is a perfect use case for <a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a>, which I have talked about in various forums now.  vscsiStats avoid sampling windows by recording precise information on every IO.  This means that microburst statistics will not be averaged&#8211;and lost&#8211;across a time period.</p>
<p>Consider the following data I pulled from a sample session on my office system:</p>
<table border="0">
<tbody>
<tr>
<td><strong>Frequency</strong></td>
<td><strong>Histogram Bucket Limit</strong></td>
</tr>
<tr>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>50</td>
<td>4</td>
</tr>
<tr>
<td>879</td>
<td>6</td>
</tr>
<tr>
<td>6588</td>
<td>8</td>
</tr>
<tr>
<td>82830</td>
<td>12</td>
</tr>
<tr>
<td>161362</td>
<td>16</td>
</tr>
<tr>
<td>79802</td>
<td>20</td>
</tr>
<tr>
<td>18080</td>
<td>24</td>
</tr>
<tr>
<td>5377</td>
<td>28</td>
</tr>
<tr>
<td>1997</td>
<td>32</td>
</tr>
<tr>
<td>433</td>
<td>64</td>
</tr>
<tr>
<td>0</td>
<td>64</td>
</tr>
</tbody>
</table>
<p>This table shows the number of outstanding IOs as each new IO arrives in the VMkernel.  The first row means that during the collection period only two IOs arrived to a queue with one outstanding IO.  Row two says that two IOs entered when there was were two outstanding IOs.  The third row states that 50 IOs arrived while the queue had 3-4 IOs.  And so on.</p>
<p>This table represents a fairly healthy access pattern, showing that only 433 out of 357,402 IOs arrived while the queue had 33-64 outstanding IOs (shown on the last row).  With ESX&#8217;s default LUN queue depth at 32, vscsiStats shows that a very small number of IOs arrived to an overflowing queue.</p>
<p>In summary, some storage performance issues appear and disappear so rapidly as to not be visible with sampling based tools, even as fine-grained as esxtop.  As a VI admin you should consider this in your most challenging troubleshooting cases.  And remember to use vscsiStats if all else has failed.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Performance Troubleshooting: No PhD Required!</title>
		<link>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/</link>
		<comments>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 18:42:22 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[tier-1]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vmworld]]></category>
		<category><![CDATA[vscsistats]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=41</guid>
		<description><![CDATA[A couple of weeks ago at VMworld in San Francisco I squeezed a few press meetings in between the 19 sessions of the performance lab I led.  In one of those meetings I talked with David Vellante and two of his colleagues to discuss vSphere performance and performance monitoring.  David and company asked some [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago at VMworld in San Francisco I squeezed a few press meetings in between the 19 sessions of the performance lab I led.  In one of those meetings I talked with <a href="http://www.internetevolution.com/profile.asp?piddl_userid=13982">David Vellante</a> and two of his colleagues to discuss vSphere performance and performance monitoring.  David and company asked some hard questions about our performance work but my knowledge of this area runs deep, so the conversation was fruitful and interesting.</p>
<p>A few days after the conference a coworker of mine shared the following quote with me, courtesy of <a href="http://www.internetevolution.com/author.asp?section_id=654&amp;doc_id=181395">an article by David on Internet Evolution</a>:</p>
<blockquote><p>The fact is, most data center managers wouldn’t trust VMware to manage their Tier 1 applications because if something goes wrong performance-wise, you still need to roll in the VMware PhDs to solve it.</p></blockquote>
<p>Let me respond to a few of the suggestions from this quote.</p>
<h2><span id="more-41"></span>&#8220;Customers Do Not Trust VMware for Tier-1 Apps&#8221;</h2>
<p>The following chart presents data collected from 676 VMware customers in July and August of 2008.</p>
<div id="attachment_58" class="wp-caption alignnone" style="width: 619px"><img class="size-full wp-image-58" title="Application Virtualization Rates (2008)" src="http://vpivot.files.wordpress.com/2009/09/picture-11.png" alt="Percentage of application instances virtualized by VMware customers." width="609" height="307" /><p class="wp-caption-text">Percentage of application instances virtualized by VMware customers.</p></div>
<p>This graph shows the large rates of virtualization of the most well-known enterprise applications.  By any definition of &#8220;Tier-1 Application&#8221;, at least one tier-1 application is mostly virtualized by this customer sample.  And the survey date bears repeating: summer, 2008.  Virtualization acceptance has greatly increased in the past 12 months.</p>
<p>Concerns about performance management aside, VMware customers <em>are</em> virtualizing their tier-1 apps <em>today</em>.  So let&#8217;s talk about the process of performance troubleshooting.</p>
<h2>&#8220;A PhD From VMware Is Required to Fix Performance Problems&#8221;</h2>
<p>I think that David must have inferred from my confident and detailed talk on a great number of performance-related technical topics that I am the cream of the crop of America&#8217;s educational system.  For the record, I went to a state school in Alabama and spent far more time drinking beer than going to class.  Nonetheless, I am sure what he meant to say was&#8230;</p>
<h2>&#8220;A Highly-skilled Performance Expert Is Required to Fix Performance Problems&#8221;</h2>
<p>VMware now boasts over 150,000 customers, and I only interact with a relative handful a year.  If I count the experts in our small performance community I can conclude that our performance experts touch a very small percentage of our customer base each year.  That means that the great majority of our customers are solving their performance problems without engaging us.</p>
<p>Customers are fixing their problems using a variety of tools that I continue to document:</p>
<ul>
<li>The vSphere client interface to vCenter is known to everyone and <a href="http://communities.vmware.com/docs/DOC-5600">its counters</a> operate with 20s granularity and are effective at fixing about 90% of most performance problems.</li>
<li>esxtop, with its finer granularity and <a href="http://communities.vmware.com/docs/DOC-9279">larger counter list</a>, can be used to fix 95% of problems, most of which could have been fixed with vCenter statistics.</li>
<li><a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a> is extraordinarily useful for a small percentage of problems, perhaps 10-20% of those I see.</li>
</ul>
<p>We are currently working on collecting all of these views into the client and adding a framework, <a href="http://communities.vmware.com/community/developer/forums/vprobes?view=overview">vProbes</a>, that will enable unprecedented visibility into operating systems and applications.  But even as things stand today, we have provided documentation and tools that all of our customers can use to fix any problem.  There is always room for improvement, but no PhD is required.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
