Bridging the memory gap in AI applications
As artificial intelligence (AI) technologies continue to evolve, the demand for optimised hardware is growing, transforming the memory market to meet AI's specific requirements. KIOXIA’s XL-FLASH IS an innovative solution designed to bridge the gap between traditional memory options and the emerging needs of AI applications.
By Axel Störmann, VP & Chief Technology Officer, KIOXIA Europe
This article originally appeared in the July'24 magazine issue of Electronic Specifier Design – see ES's Magazine Archives for more featured publications.
The evolution of AI since its inception has been largely driven by advancements in software. However, the success of AI also heavily relies on the hardware it runs on. As AI applications have grown in complexity and capability, the demand for specialised hardware to develop, train, and deploy these technologies has become increasingly critical. A key challenge is finding the optimal memory solution to support AI algorithms' intensive processing requirements. Thus, the choice of memory – whether for storing training data, handling operational input data, or managing intermediate results – is a significant and critical decision.
The multi-stage journey of AI memory requirements
A closer look at AI memory requirements shows that deploying a new AI algorithm is a multi-faceted process. It starts with gathering the necessary training data (Figure 1), which can range from video feeds of urban traffic to voice recordings and medical imagery. This diverse data is initially processed through a high-throughput, write-once memory system, marking the first phase in the AI development pipeline. The next phase involves data cleaning and transformation to ensure compatibility with AI frameworks, necessitating variable memory speeds depending on the specific data being processed.
Figure 1. The AI data pipeline can be broken into four stages
Subsequently, the training phase kicks in, where the curated data is fed into hardware, typically a Graphics Processing Unit (GPU) with fast local memory, to refine the algorithm's accuracy in tasks such as identifying medical anomalies in imagery. During this stage, data storage leans heavily on
high-speed, read-centric memory, with Dynamic Random-Access Memory (DRAM) being the usual choice due to its speed and compatibility with existing computing architectures, despite its primary function being read-oriented in this phase.
Lastly, the deployment phase presents a different set of memory demands, varying widely based on the application, from generative AI tasks on servers to operational needs on edge devices like smart cameras. This stage requires a mix of primarily read-only memory to house the AI model and some read-write memory for processing intermediate and output data. The challenge across these stages is to balance performance with cost and power consumption (Figure 2).
Figure 2. While performance is the main requirement during AI training, cost and power consumption rise in importance when AI models are deployed in applications
Weighing the options: DRAM versus SSD
So, what are the options when it comes to memory today? The choice between DRAM and Solid-State Drives (SSDs) presents a tricky dilemma. DRAM, known for its high-speed read-write capabilities, is indispensable for the swift progression of AI algorithms. However, its drawbacks, including high costs, significant energy consumption, and scalability challenges, cannot be dismissed. To make matters worse, the volatility of the DRAM market, with its frequent fluctuations in price and availability, adds another layer of complexity to hardware planning for AI applications.
SSDs, on the other hand, stand out as a preferable option for read-intensive stages of AI development, offering the advantage of lower power consumption. Despite this, SSDs are not without their own set of challenges, notably latency issues that can hinder AI processing efficiency. For example, M.2 and PCIe 4.0 KIOXIA NVMe SSDs deliver impressive sequential read speeds of up to 7 GB/s, aligning with DDR5-5600 bandwidths for a 10-core CPU. Despite these advantages, the inherent latency in flash-based storage, approximately 100 µs, introduces delays that can hinder efficiency (Figure 3).
Figure 3. While a flash-based non-volatile memory could replace DRAM in steps, SSDs a classic flash aren’t an option due to their system latency
Redefining AI memory with SCM
The search for the optimal memory solution has led to a new layer in the memory hierarchy, the concept of Storage Class Memory (SCM). SCM, exemplified by KIOXIA's XL-FLASH, is designed to address the challenges posed by the DRAM versus SSD dilemma (Figure 4). Achieving read latencies of less than 5µs – ten times faster than conventional flash – XL-FLASH offers a compelling alternative priced below DRAM.
Figure 4. KIOXIA’s XL-FLASH has been demonstrated as an SCM alternative to DRAM in read-heavy applications
XL-FLASH can be seamlessly integrated into storage solutions via a PCIe NVMe interface. It utilises a more efficient page size of 4KB instead of the traditional 16KB found in standard flash devices. It is scalable and is built on advanced BiCS FLASH 3D multi-die package technology. XL-FLASH maintains the high standards of cell reliability, swift read/program speeds, and cost-effectiveness that users have come to expect.
To ensure that as much of the latency improvements and available bandwidth as possible are passed on in terms of integration, XL-FLASH utilises Compute Express Link (CXL) at a link layer. Thus, it provides efficient access to memory and accelerators with the agility and cost-effectiveness required. In addition, with CXL support on both Intel and Arm processor architectures, whether AI applications run on servers or Edge devices, they can take advantage of XL-FLASH as a lower-cost DRAM and lower-latency flash alternative.
With the complexity of AI development and deployment, the role of bespoke memory solutions such as SCM is becoming increasingly critical, providing the path to optimised performance, lower cost and improved efficiencies.