vPivot

Scott Drummonds on Virtualization

Windows Guest Defragmentation, Take Two

28 Comments »

I have received questions about guest defragmentation tools for years.  Until today I could only pose theories as to the value of guest defragmentation.  But previous theories spawned new research and one of VMware’s partners is now putting data behind their argument that file systems in Windows virtual machines require defragmentation.  This partner, Raxco Software, shared early results of this investigation with me.  Raxco used their NTFS defragmentation tool PerfectDisk to evaluate the impact of guest defragmentation on a single virtual machine.

Before I describe the test and its results, I want to share an important point on guest defragmentation.  Most of the computer literate are aware that file fragmentation–the separation of logically contiguous pieces of a file–can hurt storage performance.  But many may not realize that free space fragmentation is as big of an issue.  When free space is fragmented, writes take longer and files are re-fragmented rapidly.  PerfectDisk defragments files and free space and the results below benefit from both of these improvements.

The following steps were used to setup the test environment:

  1. A  new virtual machine was constructed with Windows Server 2008 on a single 50 GB virtual disk.  29 GB (58%) of the disk was populated with he OS and miscellaneous user files.  21 GB of free space remained.
  2. They ran a tool that simulated months of user activity by reordering blocks to fragment files and free space.
  3. The fragmented virtual machine was cloned.
  4. PerfectDisk was run on the second virtual machine to produce a defragmented virtual disk.

The Raxco team then compared the performance of the fragmented virtual machine with the defragmented one by measuring the installation time of Microsoft SQL Server 2008. This workload was chosen because it produces a bounded, write-intensive test that can easily be monitored with vscsiStats.  Only one virtual machine was running on the host.

Let’s take a look at a few key graphs.

Comparing IO size

The number of IOs during the application install, organized by size.

This first histogram shows IO counts by size.  You can see that IO counts for all but the largest bucket have decreased because Perfect disk is reordering files to minimize small IOs and maximize large IOs.  For the storage hardware this means greater efficiency in processing IOs.  But it also means two things to ESX:

  • More host throughput as the fixed HBA queue is now holding larger commands.
  • Fewer virtual storage stack traversals resulting is lower CPU utilization.

However, PerfectDisk not only consolidates small IOs into large IOs, it also makes files logically contiguous as they are seen by the NTFS file system. This means less work for disk controllers when mapping logical to physical clusters on disk.

Next we have the vscsiStats seek distance histogram which shows the shift from random to sequential IO.

Distance between successive IOs

The seek distance histogram shows the number of logical blocks between each successive IO.

The seek distance histogram shows a clear increase in the number of IOs that were exactly one block after the previous IO.  This pattern, demonstrated in the bucket “1″ increase, reflects the increased sequential nature of accesses to the defragmented virtual disk.  In this case the fragmented disk access was 15% sequential while the defragmented disk was 30% sequential.

Let us next look at latency.

Number of IOs by Latency

This histogram counts IOs by latency.

The latency histogram shows a decrease in all IOs across the board and a near elimination of IOs that took more than 30,000 microseconds (30 ms).  Those very slow IOs accounted for 15% of all the commands in the fragmented case.  By eliminating the 15% slowest IOs, you can imagine that the total IO performance and application execution time would greatly improve.  That is exactly what happened, as the following data show:

Metric Fragmented Disk Defragmented Disk Comment
Total IOs 166412 105620 A decrease in total IOs is a result of Windows making fewer requests for larger IOs in the defragmented case.
Mean IO Response Time 58.5 ms 3.5 ms The best application metric for this test showed a 33% decrease in execution time.
SQL Server 2008 Install Time 45 minutes 30 minutes The best application metric for this test showed a 33% reduction in execution time.

Let me repeat one of those amazing data points: the average IO latency dropped from about 55 ms to less than 4 ms. While this is a phenomenal number, the increase depends on characteristics of the storage system.  Since these improvements are configuration dependent, your results may vary considerably.

As Raxco continues its investigation I remain cautiously optimistic of the value of guest defragmentation.  I think the exchange of small IO for large IO is indisputably a Good Thing.  However, virtual environments are very complex and I harbor some concerns about guest defragmentation in virtual environments that must be considered.  For instance:

  • Defragmentation in your virtual machines backed by linked clones may explode those VMs’ VMDKs’ consumption of their VMFS volumes.
  • The value of increased sequential access in a single virtual machine will decrease some in consolidated environments. This is because multiple virtual machines’ sequential access gets interleaved at the array, increasing the randomness of the IO from the array’s point of view.
  • Guest block reordering may have negative consequences to your array, as Vaughn Stewart argued in a comment to my first entry on the subject.
  • The value of file defragmentation may be limited when applications produce small random block access to data files, as some databases tend to do.

Raxco is continuing to investigate guest defragmentation to respond to some of these concerns and provide best practices for PerfectDisk’s usage.  I will update you as the research continues.

Test Details

ESX Server Configuration

  • ESX Version: 3.5.0 Update 1
  • Motherboard: Intel S5000PSL
  • CPU Type: Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
  • Number of CPUs: 2
  • Cores per CPU: 4
  • Logical Processors: 8
  • Memory: 4 GB

Storage Configuration

  • RAID controller: Adaptec RAID 3805
  • Number of Drives: 4
  • Drive Type: WD1001FALS 1TB 7200 RPM 32MB Cache
  • Total Capacity: 4.0 TB
  • Number of LUNS: 2
  • LUN 1 RAID level: 5
  • LUN 1 Capacity: 2.00 TB
  • LUN 1 Partitions: 1
  • LUN 1 Name: IOTesting
  • LUN 2 RAID level: 5
  • LUN 2 Capacity: 744.75 GB
  • LUN 2 Partitions: 1

Datastore Configuration

  • Number of Datastores: 2
  • Datastore 1 Name: IOTesting
  • Number of VMs: 2
  • Capacity: 2.00 TB
  • Target LUN: LUN 1 (from Storage configuration)
  • Datastore 2 Name: ISO
  • Number of VMs: 0
  • Capacity: 744.75 GB
  • Target LUN: LUN 2 (from Storage configuration above)

VM Configuration

  • Number of VMs: 2
  • Operating System: Windows Server 2008 R2 (64-bit)
  • Memory: 2GB
  • Number of CPUs: 2
  • SCSI Controller: LSI Logic (no SCSI bus sharing)
  • Number of Disks: 1
  • Size of Disk: 50 GB
  • Provisioning Type: Thick
  • Backing Datastore: IOTesting
  • Virtual Memory: none (pagefile disabled)
  • Network: disabled

28 Responses

Scott,
Thanks for posting this, this is very interesting data.
As an aside, I’ve had customers inquire about tools like this, specifically around VDI workloads (as some customers use DisKeeper or PerfectDisk on their desktops), and I’ve recommended they steer clear because any Degragmentation actions would increase the size of a thin-provisioned disk. Especially when using View Composer, this would largely negate the disk savings you would get by storing master gold images.
Would you concur that when using Thin-Provisioned VM disks that defragmentation tools would serve to undermine the storage savings offered by thin-provisioning?
With thick disks I would say go for it.

    • Thin disks are actually not a problem. It is true that PerfectDisk will increase the size of the thin provisioned VMDKs. However, if you perform a storage vMotion of the VMDK then vSphere will reclaim the unused blocks and pack the VMDK to a new, optimal size.

      But note that sparse disks (linked clones) are different. Right now we think that the golden image should be defragged but no defragmention should be performed on linked clones.

  • Scott…

    I am wondering the same thing as ermac318 is asking about Thin Provisioned volumes. I have always understood that you should NOT defragment thin provisioned volumes as there is no benefit and it really could cause problems with your storage. On top of that question, if you are doing thin provisioned volumes on your storage AND thin provisioning on VMware would this cause problems if you fire off a defragmentation job?

    I do think that since defragmenting moves blocks together and being a Raxco Perfectdisk user for almost 4 years, I can attest that their product is solid, better then others. But I am very curious about your thoughts or Raxco’s thoughts on defragging TP volumes and TP VM’s.

    Jonathan

    • As I mentioned in my reply to ermac318, the increased VMDK size due to guest defragmentation can be corrected through storage vMotion. But the thin-disks-on-thin-disks issue is too complicated for me to answer completely. I will have to defer a response to that for another article at a later time.

      • Scott,
        The storage array itself would have to have a way of performing a disk “shrink” like VMware does when you storage VMotion. As far as I know no arrays have this feature (at least for block-based storage).

    • I can provide some insight into this as I work with TP VM’s day in and day out.

      First I should say that I’m not surprised by the findings in the piece above – our test environment showed wonderful improvements using the PerfectDisk ESX solution a year ago. They have a new version now and I’m looking forward to seeing what’s new.

      Regarding TP VM’s, our results are consistent. It seems the PerfectDisk engine specifically allocates some free space to use as scratch when moving clusters, that scratch is usually between 500MB – 4GB but varies with the amount of movement that needs to be performed.

      On average, we see a predictable increase of 1.5GB to 2.5GB size increase for TP VM’s. This is really not a problem for us and works great when you factor in Scott’s comment – a vMotion move will resize the disk.

      Of particular note is another claim made by PerfectDisk is that you’ll get better shrinks afterward using “consolidate free space”. I have found that to work wonderfully and is great. I would argue that if anything, using something like PerfectDisk just for better shrinks is worth it. The licensing is very reasonable.

      Should note that I did not like the new Diskeeper 2010 for this, it blows TP VM’s out. It does not have a consolidate option so I imagine that has to be the reason why.

  • RedDuke,

    I work for an enterprise financial company, and like you we’ve found defrag resolved some SQL latency (Avg Disk Sec/Read over 100ms) we were seeing. My storage team, our SQL team, and our SAN vendor improved the bottleneck with some tweaks but we could not get it under 50ms. Our SAN guy actually recommended trying Diskeeper, and we got the latency to under 20ms. Not ideal, but better until we can implement more cache.

    We use thin-on-thin and found that with those defrags (we only tested Diskeeper) our storage requirements increased quite a bit. We reported this to Diskeeper and an engineer helped us with our configuration. He sent us a link to this paper which has a section on TP: http://downloads.diskeeper.com/pdf/Best_Practices_Eliminating_Fragmentation.pdf

    They have a feature in the product, called iFaast, that was unknowingly moving data to the back of our guest volumes and causing this storage increase side effect in our VMs. In my opinion, it’s incorrectly set to be ON by default. Anyway I digress, once we turned that off, the growth issue mostly went away to a predictable and acceptable amount (like you have now). So, I think you may have run into the same issue we did, and a config change would resolve it.

    We just started testing their new release with a feature called IntelliWrite that writes files without fragmentation as they are being written, not requiring a scheduled defrag and therefore supposedly TP growth. Their engineer is pitching this as fully compatible with TP and recommending to leave this feature on and then scheduling an occasional defrag once every few weeks to defrag the space. I can’t confirm that this will resolve storage growth or make it less frequent, but it is looking promising so far.

    PS: Scott, while I can confirm general benefits from defrag, these tests would be more compelling done in an enterprise environment rather than DAS 7200RPM SATA drives. That would probably be more applicable to most of us following your blog. Can you pass that request along?

    • In case anyone might ask why a thin disk for SQL – its a long story… (not my choice), but the DB is not a mission critical one, so the 20ms latency is acceptable.

      • WebDev,

        I’m not sure how you can actually “create” free space fragmentation unless you’re talking about the scattering effect we experienced. Instead of having one large block of data we ended up with a mostly consolidated block of data and lots of little files scattered everywhere.

        Beyond it looking odd, I’m not sure there were any negative effects. I dismissed it as an oddity until we needed to resize the partition. We used perfectdisk to consolidate the data rather than wait an indeterminate amount of time but we didn’t tell the diskeeper folks that. I think it would have stung a bit. That and the fact that we don’t normally reveal what other solutions are be evaluated. ;)

    • Marcus,

      I believe that vscsiStats measures high up the stack and therefore the underlying hardware is not a huge issue (except maybe latency measurements). Scott should be able to confirm/deny this.

      As for IntelliWrite, doesn’t it create free space fragmentation in order to defrag the files? The second paragraph of this blog entry mentions that free space fragmentation is just as bad as file fragmentation. If your drives are thin provisioned, wouldn’t the free space fragmentation cause larger storage requirements than if free space was not fragmented? Not sure if I’m 100% correct here, but logically it makes sense.

      • Oops that was meant as a reply to Marcus.

      • You are right, vscsistats is high up the stack, which means it would encompass more storage layers, including the HW. We’ve used it ourselves. True that some of the stats don’t have anything to do with HW, but some like latency that you noted, do.

        I don’t have a whole lot of info on IntelliWrite, so I can’t comment on free space fragmentation, just that it seems to be handling our TP issue with defrag. The jury is still out though. I’m not putting DK into production until I’ve seen a complete evaluation, not after the iFaast debacle.

    • Now that’s an excellent paper. Very comprehensive – except! There is one thing it seems to not cover / ignore.

      Or actually I guess it’s a few things. If you go over to the perfectdisk website they have a marketing bit called:

      “Virtual Infrastructure Awareness Technology (VIA Tech™)”

      It is the solution to the one real issue we found when evaluating diskeeper, its also the same issues that we ran into with our anti-virus: I/O saturation

      Trust me it took us 3 weeks before we figured out that it was numerous simultaneous anti-virus scans that were causing serious performance issues off and on again.

      So of course it makes sense that you don’t want perfectdisk to run all the manual or background defrags to run at the same time. The people at perfectdisk seem to recognize this as a problem. Diskeeper did not.

      That was a huge deciding factor for us. The “via tech” stuff is just marketing to say that the agent running on your guest knows whether or not you have physical resources to support a defrag – or if a defrag will slow the performance of another guest. You provide the agent with read only credentials to vcenter and it gets statistics that way. The nice thing is that if your guest migrates it continues to track the correct physical host, nice for vmotion.

      This seems very logical to me and makes me wonder when anti-virus vendors will take the time to adopt the same solution. Actually I believe McAfee is doing this now. Too bad we didn’t go with them.

      The folks at Diskeeper were very nice and had tons of references but they seemed to want to focus on the idea that using Diskeeper produces zero overhead, and so it’s not necessary to monitor physical host resources. But then there is no such thing as invisible resources so we didn’t find that very convincing. We basically rolled our eyes, particularly since we can see I/O usage right from resource monitor and the fact that it runs in normal priority and not in background priority. Diskeeper would do well to focus less on the marketing and more on the technical.

      It also costs a lot more, so that’s like 3 strikes. Despite the high recommendations we received for Diskeeper, the fact that they don’t have a solution specifically designed for ESX hurts them in my opinion. They do have one for Hyper-V though. If we were using Hyper-V I’d be willing to give it a second look but I highly doubt that would ever be the case.

      The IntelliWrite stuff seemed to work wonderfully though. One thing that took it down a notch was it seemed to scatter data a bit. We ran into a problem resizing a partition because of it. When we reported it to diskeeper they suggested that we turn it off or just wait, because apparently the engine would solve the problem on its own. Well we didn’t want to wait. Not actually a problem because I can’t think of the last time we resized a partition except to make it bigger, not smaller.

      • Hi RedDuke

        We are very much aware of background processes in VMs. We have to be – we run 40-50 VMs per host.

        We’re 100% VMware shop so their Hyper-V Diskeeper version doesn’t help. However, over 80% of our ESX infrastruture is SAN attached and all these “VIA Tech” solutions don’t understand the SAN fabric, and we have several. Until someone fixes that, we’re disabling background tasks, or scheduling them very carefully.

        I don’t handle security solutions, but we do run McAfee. It’s not popular here, especially today ;-). Your comment is interesting though. Can you pass on a link about their virtual aware solution? Thanks.

  • [...] 20, 2010 itvirtuality Leave a comment Go to comments There have been some recent comments on Scott Drumond’s site (and others)  regarding defragging of virtual servers.  What do you think?  Do you defrag [...]

  • The answer is a definite – defrag both the guest OS disks and the system’s host disks. We had a severe problem with performance and doing on-line and off-line defrags of the guest’s boot drive as well as an on-line defrag of the host storage was the only way we could bring the system back up and stabilize it.

    The reason you need both is very simple – head movement. When the physical disk is fragmented the heads have to move further. When the virtual disk is fragmented, the physical heads still have to move further to do their job. Combine this with multiple virtual disks and you can very quickly get thrashing of the physical disk heads.

    The concern about guest disk file size is a red herring – over-committing your physical disk space is a recipe for disaster, usually at the worst possible time.

  • I don’t have hard evidence, but when we defragged our VM’s it made the world of difference with performance. I am a firm believer in defragmentation. On virtual machines absolutely. On Hosts hosting virtual machines? That is the bigger question. I would think that if you build a VM and use all of the space for that VM when you create it, there is lesser of a need to defragment the host, but if you allow the VM to consume the space as needed I could see that the host could become very fragmented quickly.

    Just my two cents.

  • I too feel that the benefit of guest defragmentation is there for many environements, but consider this warning CAREFULLY:

    If you defrag guest VMs, in an environment that has a block-level de-duping storage system, one this is for sure, you WILL EXPLODE YOUR SAN DISK USAGE TO FULL SIZE!

    Once you touch and move every block of every VMDK file, the blocks are now all unique again as far as your de-duping storage system sees things.

    This causes the SAN to store each block again, as unique data and it must de-dupe all over again. (Assuming it can at this point).

    For those that are counting on overbooking storage (liek NETAPP users do) you will fill your volumes and crash your envioronment at the SAN level.

    For those that use thin provisioning, there are other considerations there too.

    Defragmentation of guest VMDK files, in any environment that uses thin provisioning or block level de-duping strategies is a disaster in the making.

    Be careful.

  • Thank you for your article and the information, it does give some guidence.

    I would be interested to see what statistics look like in an larger scale environment where you have more HD’s in the RAID array to see how much is effected by HD Latency vs Fragmentation. With a (4) HUD with RAID 5 is going to have a different effect VS (8) HUD with RAID 5 or an iSCSI device with 14 HUD in a RAID 10 Array supporting the Had’s.

    Consider running 15+ VHD’s per Host with (3) Hyper-V Host sharing Space in a CV environment. Are the Host communication between each other? I can see where a single host would keep check on it’s guest pretty easily but when you have multiple host sharing storage and the guest can transition between host due to fail over does it keep track of who is doing what to keep throttling of defrag activity in check?

  • I’ve used Diskkeeper and Perfect disk for years. I switched exclusively to Perfect Disk in 2007 and haven’t looked back. I wondered about using v11 on our new virtual server about a month ago and took the plunge. It made a very noticeable difference immediately. I haven’t tried the boot-time scan, didn’t even think about it until just now, but I am happy with the results and will continue to use it on all of our servers and PC’s.

  • I too would like to see some more in-depth analysis with datastores on more “intelligent” SANs, such as NetApp, EMC, Equallogic. What impact would this have on de-dupe, load balancing, auto-tiering, etc.?

  • [...] poolest on tekkinud tarkvaratootjaid, kes tahavad defragmentimist rakendada ka virtuaalmasinates. Kindlasti leidub vastuargumente, kuid üldiselt arvan, et virtuaalmasinate defragmentimine on [...]

  • [...] Another option to perform disk defragmentation is to use a third-party product such as Diskeeper, which has a new offering for virtual machine specific defragmentation and optimization with the V-locity product. Defragmentation in virtual machines is a topic that should not be ignored; in fact, former VMware employee Scott Drummonds wrote on the vPivot blog that a defragmented virtual machine can provide an amazing …. [...]

  • [...] Drummonds - Windows Guest Defragmentation, Take TwoBefore I describe the test and its results, I want to share an important point on guest [...]

  • [...] space that is available. I learned this from my friend Bob Nolan at PerfectDisk (Raxco) during our joint work on guest defragmentation over a year ago. This lazy placement produces fragmented files and free space, both of which harm [...]

  • [...] optimizations. The world of guest OS defragmentation is rich in this space with companies like my friends at PerfectDisk. But I’ve not yet seen anyone monitor and modify guest OS settings dynamically. While it [...]

  • Switch to our mobile site