Tackling data retention in SSDs
SSDs have conquered the market due to their speed, durability, energy efficiency and small form factor. However, write endurance and data retention are concerns in embedded and industrial applications, especially as the technology moves to ever smaller nodes.
This article originally appeared in the July'24 magazine issue of Electronic Specifier Design – see ES's Magazine Archives for more featured publications.
By: Anil Burra, Director of Technical Business Development at Intelligent Memory
SSDs have conquered the market due to their speed, durability, energy efficiency and small form factor. However, write endurance and data retention are concerns in embedded and industrial applications, especially as the technology moves to ever smaller nodes.
With the scaling of NAND flash technology to higher densities (e.g., Triple-Level-Cell or Quad-Level-Cell) and smaller process nodes, data retention becomes more challenging due to increased cell-to-cell interference and reduced margins for charge storage. Therefore, newer technologies may have shorter specified data retention periods compared to older, more robust SLC (Single-Level-Cell) or MLC (Multi-Level-Cell) NAND products.
Two key factors affect data retention, the period over which flash memory can store data without loss or corruption, in NAND flash cells: program/erase (P/E) cycling and operating temperature.
Program/Erase cycling
The process of writing data to and erasing data from a NAND flash cell wears out the cell, reducing its ability to hold charge over time. The more P/E cycles a cell undergoes, the shorter its data retention period becomes. This is because repeated P/E cycles cause damage to the cell's floating gate, leading to charge leakage and potential data corruption.
Figure 1. Relationship between data retention and Program/Erase cycling
Operating temperature
High operating temperatures accelerate the ageing process of NAND flash cells, causing faster charge leakage and shorter data retention periods. Just as an example, the data retention period can drop to as little as two days at 85°C for a multi-level cell (MLC) NAND device that has undergone its specified P/E cycles. The effects of P/E cycling and high temperature are compounded, leading to even more rapid degradation of data retention.
Figure 2. Data retention bake to evaluate retention capability of flash memory devices over time and at high temperatures
How can one address data retention challenges?
To address the data retention challenges posed by advanced NAND flash technologies, most of the manufacturers employ a two-pronged approach: error correction codes (ECCs) and data management techniques.
Error correction codes
ECCs are used to detect and correct bit errors that occur due to data retention issues. Various types of ECCs, such as Bose-Chaudhuri-Hocquenghem (BCH) codes and low-density parity-check (LDPC) codes, are employed to ensure data integrity. The effectiveness of the ECC implementation depends on its ability to adapt to the specific data retention characteristics of each SSD unit.
Variations in data retention
Data retention effects can vary significantly across SSD units due to several factors:
- User behaviour: some units may be exposed to higher operating temperatures or undergo more P/E cycles than others, leading to accelerated aging and reduced data retention
- NAND flash die variations: small variations in the characteristics of NAND flash dies can occur not only between products from different manufacturers but also across production batches from a single manufacturer
- Application workloads: in some applications, data may be continually erased and replaced, while in others, stored data may remain unchanged for extended periods, affecting data retention requirements
These variations highlight the importance of tailoring data retention solutions to the specific characteristics and usage patterns of each SSD unit.
Data management techniques
SSD manufacturers implement techniques to minimise the number of bit errors that need to be detected and corrected. These techniques include monitoring the health of memory cells, retiring cells that can no longer be relied upon, and refreshing data bits in aging cells to top up the charge level. Additionally, some SSDs perform adaptive ECC operation, adjusting their operation to optimise for either long data retention or high read/write performance, depending on the application requirements.
Monitoring memory cell health and retiring bad blocks:
- SSDs continuously track metrics like program/erase (P/E) cycle counts and bit error rates for each NAND block
- Blocks that exceed a threshold for bit errors or P/E cycles are marked as ‘bad’ and retired from further use to prevent data corruption
- This ensures that only relatively healthy NAND blocks with sufficient remaining data retention capability are utilised
Data refreshing:
- As NAND flash cells age and undergo P/E cycling, their ability to retain charge (and data) diminishes over time due to charge leakage
- To counter this, SSDs periodically read data from NAND blocks and rewrite it, effectively ‘refreshing’ the charge levels and extending data retention
- The frequency of refreshing can be adjusted based on the age and health of the NAND blocks
Adaptive Error Correction Codes (ECC):
- SSDs employ error correction codes like BCH and LDPC to detect and correct bit errors caused by data retention issues
- As NAND flash ages, stronger ECC algorithms like LDPC can be employed to handle higher bit error rates while maintaining data integrity
- Some advanced techniques use machine learning to adapt the ECC scheme for each SSD based on its unique characteristics
Data clustering and region management:
- The NoFTL approach proposes clustering data with similar properties (like retention requirements) into separate physical ‘regions’ on the SSD
- Each region can then be managed using the optimal set of data retention techniques tailored for that type of data
- This allows minimising overhead while maximising data retention for each data cluster
Conclusion
SSDs have earned their space in industrial applications thanks to memory manufacturers like Intelligent Memory which are committed to providing longevity memory products for industrial markets that meet their stringent requirements on reliability, data integrity, compliance and operational efficiency. Keeping an eye out for the outlined SSD features helps in selecting the memory products that maintain the seamless and efficient operation of industrial systems.