FPGAs achieve 40GFLOPS-per-Watt in cloud data centre acceleration
Microsoft is using Altera's Arria 10 FPGAs, to achieve compelling performance-per-Watt in data centre acceleration based on Convolutional Neural Network (CNN) algorithms, which are frequently used for image classification, image recognition and natural language processing.
Microsoft researchers are working on advancing cloud technologies and are using the Arria 10 Developer Kit and engineering samples of Arria 10 FPGAs, which are demonstrating up to 40GFLOPS-per-Watt, an industry-leading level in data centre performance. When compared with GPGPUs, this FPGA performance offers a more than three times the performance-to-power advantage for CNN platforms. This performance is achieved using the open software development language known as OpenCL or VHDL to code the Arria 10 FPGA and its IEEE754 hard floating point DSP blocks.
“The FPGA has an architectural advantage for neural algorithms with the ability to convolve and do pooling very efficiently with a flexible data path which enables many OpenCL kernels to pass data directly to each other without having to go to external memory,” said Michael Strickland, Director, Compute and Storage Business Unit, Altera. “Arria 10 has an additional architectural advantage of supporting hard floating point for both multiplication and addition – this hard floating point enables more logic and a faster clock speed than traditional FPGA products.”
“We are seeing a significant leap forward in CNN performance and power efficiency with Arria 10 engineering samples and the silicon’s precision hard floating point in the DSP blocks is part of the reason we are seeing compelling results in our research,” added Doug Burger, Director, Client and Cloud Apps, Microsoft Research.