Deduplication & Compression Extend SSD Lifespans

Tegile uses FlashVols to balance endurance and performance, which continue to be challenges as NAND flash scales down.

NEWARK, Calif. — The higher costs of SSDs (relative to spinning disks) and their lifespan limitations generally keep enterprises from fully embracing them for storage. The ongoing challenge is how to optimize them to get the best bang for the buck, while also addressing the endurance problem.

Being selective of what data reside on flash through tiering is one avenue for using it strategically. Compression and deduplication techniques can be employed as well to minimize the number of writes to an SSD, thereby prolonging its life. Tiering does have drawbacks, however, said Rob Commins, Tegile Systems’ VP of marketing, because it runs after the fact by sweeping an entire pool of storage to decide which data should be moved to the faster flash storage. This process creates delays. The company recently announced the addition of FlashVols to its Metadata Accelerated Storage System architecture to optimize SQL performance on its Zebi storage arrays. In this case, inline deduplication and compression are also used to reduce overhead for highly repetitive SQL Server workloads.

Zebi storage arrays are comprised of eMLC NAND flash and 7,200 RPM spinning disks. FlashVols are volumes that are pinned in SSD so that applications run at maximum performance, Commins explained. Rather than relying on tiering policies or caching algorithms, pinned volumes remain in DRAM or flash close to the SQL Server without any involvement from the database administrator. The data are already predetermined to be frequently accessed enough that they should benefit from flash-grade performance. “It puts a little English on the ball,” said Commins.

Deduplication and compression technologies not only help improve performance, but also play a role in lengthening the lifespan of the SSD drive. Commins said each SSD drive in the system can withstand 3.5 petabytes of write data before the drive exhibits signs of write wear.

Jeff Janukowicz, IDC’s research director for solid state storage and enabling technologies, said that while SSDs can help accelerate application workloads by providing low, consistent latency and high IOPs, their higher costs compared to spinning disks and their endurance are inherent challenges for enterprise storage adoption. “Compression and deduplication both improve data efficiency rates, and this helps mitigate some of these challenges,” he said. Limiting writes to the drive also helps improve endurance.

More specifically, extending the life of the SSD is accomplished by minimizing the number of erasures in relation to the writes, said Gartner analyst Stanley Zaffos. “When you’re doing a write, you are generally having two operations occurring. You have the erasure and you have the actual write,” he said. “Reading is a more benign operation, because you’re essentially measuring something as opposed to changing it.”

All storage vendors are using techniques such as deduplication and compression to optimize SSDs for performance and endurance, said Zaffos. “The contention is who has done a better implementation from a data flow perspective, and from an algorithm selection process. Some vendors will do compression first, then deduplication. Some will do compression only.” There are a number of permutations that can be translated into software and data flows, he said.

Reliability and endurance remains a challenge for NAND flash, and workarounds such as complex architectures and algorithms to address the inherent limitations of NAND flash are starting to hit their limits, particularly when scaling below 25nm, largely due to design constraints.

While flash adoption continues apace in enterprise storage, Commins said customers are generally looking for 3% to 10% of their data to behave in a flash-like manner for specific applications and use cases, such as online transaction processing and supporting virtual desktop infrastructures.

Wikibon predicts that hybrid storage will continue to disrupt traditional disk arrays. The research firm found that the hybrid approach is superior in high-performance environments, both in cost and performance, particularly when IO rates are greater than 700 IOs per terabyte.