3D prototype of compute element for exascale
The European ExaNoDe project has built a compute node prototype paving the way to exascale, combining 3DIC with multi-chip-module integration technologies, heterogeneous compute elements with Arm cores and FPGA acceleration and the UNIMEM memory system, all powered by a high-performance, high-productivity software stack.
Denis Dutoit, Research Engineer at CEA-Leti and the coordinator of ExaNoDe, said: “Affordability and power consumption are the main hurdles for an exascale-class compute node. In the ExaNoDe project, we have built a complete prototype that integrates multiple core technologies: a 3D active interposer with chiplets, Arm cores with FPGA acceleration, a global address space, high-performance and productive programming environment, which will enable European technology to satisfy the requirements of exascale HPC.”
The ExaNoDe protoype is part of the disruptive change required to provide the necessary compute density and power efficiency for an operational exascale machine. Taking as a basis an innovative interposer developed by CEA, ExaNoDe allows the combination of multiple system-on-chips (SoC) chiplets, forming a three-dimensional integrated circuit (3DIC). This delivers multiple advantages, such as:
- Higher chip fabrication yields thanks to the smaller chip size.
- Reduced costs of customisation, as the modular design allows combination technology with lower-cost, more established technology as required.
- The flexibility to slot in compute elements – such as cpus and accelerators – in a single chip for different applications, resulting in greater performance at lower design costs.
- Reduced inter-chip communication distances, resulting in improved energy efficiency.
The UNIMEM memory system, which was created in the EUROSERVER project and is being brought to scale in the EuroEXA project, allows the creation of shared memory among multiple compute nodes. The UNIMEM shared memory is accessible through a non-coherent global address space, and is made visible to the programmer via a native UNIMEM API, standard MPI-3.0 and GPI-2.
Advances in OmpSs-2@Cluster and OpenStream allow programmers to exploit the ExaNoDe architecture through a multi-node task-based programming model. In order to increase the resilience and improve the manageability of the compute node, the software stack also includes virtualisation, with checkpointing and virtualisation of the UNIMEM capabilities.
Finally, ExaNoDe’s research activities also extend to applications. Several application areas have been selected to ensure broad coverage, including materials science and engineering. So-called ‘mini applications’ – self-contained and based on real-life applications – have been developed and ported to the architecture via the programming models and communication application programming interfaces (APIs).
Initial work has been performed to accelerate the key kernels on the compute node’s FPGA logic, and this expertise will be brought to future and ongoing projects such as EuroEXA. ETHZ developed the open source ExaConv convolutional neural network accelerator to accelerate neural network training as a demonstration of heterogeneous integration.