In this final installment of the series, I will provide some detail behind flash storage sizing. My previous entry contained an analytical and theoretical approach to sizing flash in today’s storage. When I first studied the ideas I introduced in that post, I thought the flash sizing exercise was hopeless. After all, how are customers to measure data cooling? How could a storage admin quantify skew?
As it turns out, familiarity with these abstract concepts is not needed to size flash in your environment. The same principles that Intel and AMD apply in sizing microprocessor cache can be applied to storage. There are generalizations that will suit the majority of deployments.
First, a little background. In building EMC’s Fully Automated Storage Tiering Virtual Pools (FAST VP), EMC studies the access patterns of over 3,500 arrays. We measured skew and performance, capacity and footprint. We experimented with storage layout and FAST VP block sizes. We tried two-tier and three-tier configurations and sized each to find a best fit for the average case. The results are summarized in the following figure.
Of the 3,500 arrays we analyzed, 12% of the workloads met criteria that we describe as “heavy skew”. This means 95% of the IO occurred on 5% of the data. In these configurations nearly all the hot blocks can be stored in flash when it is sized to 3% of the storage footprint. In “moderate skew” environments, the addition of 15% Fibre Channel maintained performance with a footprint only slightly larger than optimal. “Low skew” environments still showed improvement over flash-less configurations in both performance and footprint, while at the same cost.
It was this analysis that led us to recommend the low skew configuration for unknown environments. This has the following benefits:
- The cost of storage is the same as the flash-less configuration.
- The footprint is half the size of the flash-less configuration.
- Storage will be at least 20% faster for 94% of workloads. Because this measurement was provided at low skew, and higher skew environments will more heavily exercise flash, and performance will exceed non-flash deployments by more than 40% under some workloads.
EMC’s Tier Advisor can help you produce a more precise guide to size your storage tiers. But it is not strictly necessary. Deploying a three-tier architecture will improve your existing array by reducing footprint and improving performance. And if your environment has anything above low skew, adding rotating disks will capacity and improve efficiency. This works because you will be approaching the more precise tier mapping for your environment’s workloads.
This ends my three-part series on flash in the enterprise. I will conclude the series where it began. I fell in love with flash when I installed an SSD disk in my MacBook Pro. The impact to my own user experience was so dramatic as to revolutionize my own thinking about the nature of storage. If your mind has not yet been similarly transformed, go get SSD for your consumer computers right away. And know that everything we can experience for our own equipment we can deliver in the enterprise, too.