Scott Drummonds on Virtualization

The Flash Storage Revolution: Part II


In the previous entry on this ongoing series covering the flash storage revolution, I concluded that flash is now an essential part of enterprise storage. But its value proposition is hinged on high utilization. High utilization cannot be sustained without efficient auto-tiering or accurate cache sizing for flash-based cache.

This article will describe the theory behind optimal cache sizing.  Practical guidance will follow in part three, the last entry in this series. I will again lean heavily on Denis Vilfort’s presentation that I offer for download on my blog.

Every performance discussion starts with an “it depends”.  I will kick off this discussion on the characteristic on which flash sizing depends: skew (a topic I discussed once before). Skew is the degree to which different environments will touch different amounts of data. High skew environments will access 1% of their data 80% of the time. Low skew environments will access up to 10% of their data 80% of the time. This is depicted in the following picture.

Cache, flash or otherwise, can be thought of as a FIFO for data. As your users interact with their applications, they generate new data and touch old data. This places new blocks at the head of the flash FIFO, keeping them in SSD cache for a longer period of time. These blocks remain in that cache until newer data pushes it out of this logical FIFO.

Skew is a somewhat abstruse concept. But it is more easily understood when you think of it as a data cooling rate. The rate at which applications touch data can be described as cooling. High skew environments have high cooling rates, because the applications are spending most time on a little data.  This means a great deal of data becomes lightly used, or cool. Low skew environments have low cooling rates, because they frequently touch more data, keeping it warm. The following figure shows cooling in action.

At some rate–which we are leaving purely theoretical at this point–applications slough off a certain amount of data into the rarely used, or “cool”, category. This information should fall out of the flash FIFO and be relegated to lower cost storage. The above figure shows that data centers with a low cooling rate of 1.4% will take 120 days for 80% of their data to become cool. This means slowly cooling environments need larger flash-backed cache.

Cooling rate dictates the amount of flash needed in enterprise storage. Because flash for cache purchased today will be used for the array for the until the next storage purchase, environments with rapid growth of data require a higher percentage of flash on day one. But because cooling rate is non-linear, the amount of flash needed does not scale linearly with the data growth. In non-engineering jargon, this means that an environment with 100% year-over-year growth of data does not need twice the flash of an environment with 50% year-over-year growth.

This is better shown in the following figure.

You can now see that, assuming you have efficient flash usage via some technology like cache, you can predict your flash needs with only three variables:

  1. The amount of data in your environment.
  2. Your yearly growth rate of data.
  3. Your cooling rate.

Two of these numbers are easy to find but cooling rate is not. In fact, it is so difficult that I am not sure it is even obtainable in most environments. But without it how are we to recommend flash purchases? The answer is simpler than you think and the subject of the final article in this series.

4 Responses

[…] my three part series on flash I interchangeably used the terms “flash” and “SSD”.  In a […]

  • Hi Scott,

    I think I may have misunderstood, but this paragraph:

    “Low skew environments have high cooling rates, because the applications are spending so much time on a small set of data. High skew environments have low cooling rates, because they are frequently touching so much data. The following figure shows cooling in action.”

    Seems to contradict this paragraph:

    ” High skew environments will access 1% of their data 80% of the time. Low skew environments will access up to 10% of their data 80% of the time.”

    Not trying to be picky, just interested in understanding this better so I can explain to my customers. High Skew = small working set, Low Skew = larger working set. Right?



    • Ben, your understanding is correct. You caught an error in my article. I have updated the text, swapping “low skew” and “high skew” in the first section you quoted. The presentation I link is correct, of course. Check slide 17 where the presentation puts example numbers showing that low skew environments keep data hotter longer.

      Excellent catch and thank you for bringing it to my attention.

  • No, thank you. This kind of stuff makes my job easier, and my more tech-savy customers love it.