Going beyond P/E cycles – what really defines NAND flash endurance
All too often, NAND flash manufacturers advertise program/erase (P/E) cycles as a true representation of endurance. However, using this metric to estimate the lifetime of NAND flash will always provide an overly optimistic result.
This article originally appeared in the Jan'24 magazine issue of Electronic Specifier Design – see ES's Magazine Archives for more featured publications.
All too often, NAND flash manufacturers advertise program/erase (P/E) cycles as a true representation of endurance. However, using this metric to estimate the lifetime of NAND flash will always provide an overly optimistic result.
Much the same as the miles per gallon number given by a vehicle manufacturer will provide an unattainable fuel efficiency figure, the number of factors affecting NAND flash durability will ensure the manufacturer’s P/E figure can never be reached in practice.
As the fuel efficiency of a vehicle is impacted by real-world traffic conditions and inefficiencies in its components, the lifetime of a NAND flash depends on its use and application requirements. So, any engineer relying on the given P/E cycles figure as true measurement of the ‘milage’ of NAND flash will most likely face a product that in the prime of its life cycle ends up like a broken car on the side of the road.
What is limiting NAND endurance?
The P/E cycle figure is calculated using only the blocks of new data to be written to the NAND memory operates, there is always more data written to the drive than the supplied data. This is due to the data reshuffling that takes place within the NAND architecture, meaning that there are multiple writes happening for every write operation the user requests in the drive. The discrepancy between the data to be written to the drive, and the amount of data actually written in the process is known as the write amplification factors (WAF). Since the number of times the cells within a NAND flash can be written to is finite, the larger the WAF, the shorter the endurance.
Endurance of flash memory is often defined by its P/E cycles.
In an ideal world, a WAF of one would demonstrate that the amount of data the entire system writes into the drive’s controller is precisely the same as the amount of data the controllers write into the drive’s NAND flash. In the real world, this never occurs. Methods used to increase endurance include:
Overprovisioning
Overprovisioning reserves spare flash blocks for use by the controller. The algorithm bypasses the tedious and time-consuming erase/write process that occurs when the flash must write even though an entire block has been used. This can involve reading new content, adding new content to cache, removing unused data from cache, erasing the addressed block in flash, copying the entire block from the cache, and emptying the cache. Instead of erasing the unavailable portion of the block to accommodate new data, the controller uses spare blocks.
Wear levelling
Wear levelling distributes erase and writes across larger flash block sections to ensure the same memory blocks are not overwritten too often, limiting the risk that the entire drive’s life will be determined by a single, over-utilised block. Instead, all blocks should receive similar amounts of P/E cycles.
Several NAND parameters impact the Write Amplification Factor of a flash memory.
Garbage collection
Garbage collection invalidates obsolete blocks and frees them up for write operations by selecting ‘victim’ blocks – or blocks containing invalid sectors – and copying the valid information into other free sectors to finally erase the victim block. To minimise any impact of drive performance, this is often run in the background. The efficiency is defined as the average number of invalid pages in each victim block to be erased. A high garbage collection efficiency with more optimal victim selection (e.g., FIFO, Greedy, Windowed, etc.) improves the WAF.
Thermal throttling
Another factor that can reduce endurance is that of high ambient temperature. The drive may experience some derating with any temperature increase, where overheating will cause poor data retention and reduce endurance. Thermal throttling addresses overheating by reducing drive speeds. This allows the chip to cool and mitigates the risk of part failure or device degradation.
Accounting for workload
The endurance of the drive will also vary with the data stream from the host, or the workload. These can either be sequential or random read/write accesses. It has been found that mixed workloads composed of mostly sequential write (with a small number of random writes) will drive up the WAF substantially and cause more wear than larger sequential writes.
For all these reasons, but especially because the differences in workload, the endurance of a NAND flash device cannot truly be defined by P/E cycles. Sequential workloads will exhibit a WAF much closer to one than random workloads. The nature of the workload is highly dependent on the end application; algorithms that run in the background to maximise endurance may be necessary or completely unnecessary. It is therefore important to thoroughly understand the requirements of the NAND flash based upon the specific application and work from there. Datasheets that list WAF-based parameters such as dynamic workloads offer much better insight into the drive’s endurance. These are the parameters to look out for when choosing your next NAND flash.