Scott Drummonds on Virtualization

The Flash Storage Revolution: Part III


In this final installment of the series, I will provide some detail behind flash storage sizing.  My previous entry contained an analytical and theoretical approach to sizing flash in today’s storage.  When I first studied the ideas I introduced in that post, I thought the flash sizing exercise was hopeless.  After all, how are customers to measure data cooling?  How could a storage admin quantify skew?

As it turns out, familiarity with these abstract concepts is not needed to size flash in your environment.  The same principles that Intel and AMD apply in sizing microprocessor cache can be applied to storage.  There are generalizations that will suit the majority of deployments.

First, a little background.  In building EMC’s Fully Automated Storage Tiering Virtual Pools (FAST VP), EMC studies the access patterns of over 3,500 arrays.  We measured skew and performance, capacity and footprint.  We experimented with storage layout and FAST VP block sizes.  We tried two-tier and three-tier configurations and sized each to find a best fit for the average case.  The results are summarized in the following figure.

The EMC study that preceded the launch of FAST VP identified three basic tier configurations to improve performance and footprint in 94% of environments.

Of the 3,500 arrays we analyzed, 12% of the workloads met criteria that we describe as “heavy skew”.  This means 95% of the IO occurred on 5% of the data.  In these configurations nearly all the hot blocks can be stored in flash when it is sized to 3% of the storage footprint.  In “moderate skew” environments, the addition of 15% Fibre Channel maintained performance with a footprint only slightly larger than optimal.  “Low skew” environments still showed improvement over flash-less configurations in both performance and footprint, while at the same cost.

It was this analysis that led us to recommend the low skew configuration for unknown environments.  This has the following benefits:

  • The cost of storage is the same as the flash-less configuration.
  • The footprint is half the size of the flash-less configuration.
  • Storage will be at least 20% faster for 94% of workloads.  Because this measurement was provided at low skew, and higher skew environments will more heavily exercise flash, and performance will exceed non-flash deployments by more than 40% under some workloads.

EMC’s Tier Advisor can help you produce a more precise guide to size your storage tiers.  But it is not strictly necessary.  Deploying a three-tier architecture will improve your existing array by reducing footprint and improving performance.  And if your environment has anything above low skew, adding rotating disks will capacity and improve efficiency.  This works because you will be approaching the more precise tier mapping for your environment’s workloads.

This ends my three-part series on flash in the enterprise.  I will conclude the series where it began.  I fell in love with flash when I installed an SSD disk in my MacBook Pro.  The impact to my own user experience was so dramatic as to revolutionize my own thinking about the nature of storage.  If your mind has not yet been similarly transformed, go get SSD for your consumer computers right away.  And know that everything we can experience for our own equipment we can deliver in the enterprise, too.

11 Responses

Are those performance measurements just for 5% FAST VP? If so do you have any numbers for FAST Cache?

    • These numbers are just for FAST VP. You raise a good point, though. I’ve only provided estimates for improvements when committing flash to FAST VP. I’ll need to add an addendum that includes FAST Cache performance. Will research and post soon.


  • Hi,

    What this algorithm does not tell us is the IOPS required for each tier, it simply states the capacity for each tier.

    We could size the EFD tier using 100GB or 200GB drives, the 15K tier using 300GB or 600GB drives, and the 7.2K tier using 1TB, 2TB or 3TB drives.

    So if we just size by capacity rather than capacity and IOPS we will have a huge difference in IOPS depending on the drive sizes we choose.

    Your comments would be appreciated.

    Many thanks

    • Hi, Mark.

      Your observation is correct. This analysis is based on capacity, not IOPS. And generally sizing based on capacity only leads to performance problems.

      However, what I think is insightful about the study I referenced is that you do not need to know specific IOPS demands to properly design storage. The reasons for this are the following:
      (1) Most consolidated workloads produce skew curves that match one of the above three examples.
      (2) You can design for the low skew environment and provide a solution that is better, in speed, footprint, and cost, than all-disk designs.
      (3) With auto-tiering, and a storage design that allows enough hot data to reside on flash, the right data will get the high throughput flash and the other data will get the cheap capacity disk.

      In other words, by following my guidance above you do not need to design for throughput. Let FAST figure it out.

      • Hi,

        Surely as part of the EMC’s analysis they have determined the IOPS that has gone into each tier.

        My concern is that if we used 3TB 7.2K drives for tier 3 or even 900GB 10K for tier 2, then we could significantly under size the performance of these tiers.

        In the future as we get larger SAS and NL-SAS drives this problem will only get worse.

        I would have thought for each tier we need to define % of overall capacity and % of overall IOPS.

        So if we had 10TB and 10,000 IOPS we might have:

        EFD 3% capacity 80% IOPS (300GB and 8,000 IOPS)
        15K 27% capacity 15% IOPS (2.7TB and 1,500 IOPS)
        7.2K 70% capacity 5% IOPS (7TB and 500 IOPS)

        This would then allow us to easily size each tier.

        Are you therefore suggesting using the largest or smallest size drives for tier 2 and 3?

        This decision will have a significant impact on performance, cost and free drive bays in the array.

        Many thanks

        • Mark,

          You are right that IOPS still needs to be considered at the macro level: the array or the storage pool. Because FAST manages blocks and not LUNs, we no longer need to size IOPS for individual LUNs. But we still have to consider the aggregate performance capabilities of the storage pool that contains the FAST VP LUNs.

          The example you give is accurate enough for discussion. If a storage administrator needs 10TB of storage and 10K IOPS of throughput, we can build the storage pool as you specified. But if the IOPS or capacity requires change then the ratio is going to change, too.

          The easiest way to add IOPS is to add SSD. The easiest way to add capacity is with SATA or SAS. Any time we add storage we are increasing both the performance capability and the capacity. This implies that one will be over-provisioned. But starting from a good baseline–like the 3/27/70 “low skew” configuration–and making small modifications will keep the storage design as close to optimal as possible.

          But remember that part of my “design-by-capacity” approach implies the workload meets characteristics we identified as high to low skew. We saw this in 94% of environments. If an admin knows that his workload is one of the strange 6%, then we cannot use my general approach for storage design. Zero skew or very high IOPS environments would clearly fall into that rare 6%. I neglected to point this out in my article so thank you for bringing it up.

          Hope this helps and thanks for the great questions.

  • Hi Scott,

    Is there a detailed paper that goes into the analysis that EMC performed on the 3,000+ sample of arrays, I am sure this will list the IOPS split for each tier.

    I am sorry to say this, but in my book this still does not make sense!!!

    I cannot believe that FAST has been out for 18 months now and they has been no detailed analysis published by EMC on how to size it.

    I asked about sizing of FAST tiers at EMC World 2010 and was told that tools would released before the software was available.

    18 months later no sign of the tools and just sizing algorithms based on either capacity percentages or the fact that the customer knows the locality of reference percentage with no indication of how to calculate it (how is the customer possibly going to know this!!!)

    I agree with you in that it is almost impossible to size this scientifically instead we need to use statistical analysis and the rules of probability as per your graph above.

    But the IOPS per tier and therefore disk sizes is missing!!!

    I am concerned that FAST is becoming a marketing exercise and EMC’s silence on how to size it is worrying.

    At the end of the day if a customer wants a low risk solution then just size based on a single tier and add FAST Cache for acceleration – sounds a bit like NetApp’s approach!!!

    Please reply directly if you prefer.

    Many thanks
    EMC Partner

    • Hi, Mark,

      The point of my article was that most configurations can be satisfied with a general solution. But there is always a place for analysis to build the right solution the first time. I mentioned above that EMC has a tool that will perform detailed analysis of workloads. It is called Tier Advisor. We run tier advisor in pre-sales and services engagements. I am not sure if we make it available to our partners.

      I will share with my colleagues that you want Tier Advisor made available publicly, or at least to the partner community.

      For what follows I am speaking for myself and not EMC.

      I think deterministic storage design is core intellectual property of the consultants that deliver it. EMC offers storage design for multi-tiered environments through tools and services. Many consultants in EMC’s partner community will perform the task.

      The process of creating a precise storage architecture is straightforward, although very involved. Capacity usage and IOPS can be measured, assumptions about peakiness and burstiness of workloads can be made, and tiers can be designed to guarantee needs are met. I do not think EMC is the only company that could perform this service. Nor do I think that any special knowledge about FAST VP is needed to design a better solution.

      If you have more comments about this process please email me at scott dot drummonds at emc dot com. I will help find the people in EMC that can answer your questions and get you the information you need to do your own detailed designs.

  • […] Drummonds wraps up his 3 part series on flash storage in part 3, which contains information on sizing flash storage. If you haven’t been reading this series, I’d recommend giving it a […]

  • I have a question about the FAST architecture in general. Is there a way to track what chunks are moving into and out of the flash tier? The reason I’m curious … I imagine a situation where the flash tier is too small and there’s contention for it, with 3 or 4 related sets of chunks continually elbowing each other out of the flash tier. I realize your sizing rules should keep most installation from seeing this, but how could it be detected if it did start happening?

    Thanks for the posts — I appreciate the time and knowledge you put into them.

    • Gary, I do not believe there is any tool (internal or external) to track a block’s migrations over a period of time. I will check with the experts and correct myself here if I am wrong.