Opening Up To Hardware Independence

14th May 2013

ES Admin

0 0

OpenCL offers the capability to accelerate compute intensive algorithms, completely independent to hardware. This ES Design magazine article explores further, by Wolfgang Eisenbarth, Vice President of Embedded Computer Technology, MSC Vertriebs GmbH, and Philipp Zieboll, Field Applications Engineer, Embedded Computer Technology, MSC Vertriebs GmbH.

The average amount of data required for high-definition image capture and processing applications in the health care sector is continually increasing. Furthermore the algorithms used in image processing are becoming more complex and compute-intensive. Typically high-performance hardware solutions — such as multi-core processors, Accelerated Processing Units (APUs), Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs) — are used in order to deal with this increased computing load. These devices offer high computing performance, but hardware-specific code or manufacturer-dependent extensions were needed. Today, the solution is Open Computing Language (OpenCL 1.0 was released in late 2008). Complex tasks in medical image processing are now frequently carried out by standardised processor modules that support OpenCL, enabling uniform programming for various high-performance hardware architectures.

OpenCL is an open and royalty-free programming standard for general-purpose computing on heterogeneous systems. The OpenCL standard was developed by software specialists from leading industrial concerns, who then submitted a draft to the Khronos Group for standardisation.

The Khronos Group, founded in January 2000, is a non-profit, member-funded consortium focused on the creation of royalty-free open standards for parallel computing, graphics and dynamic media for a wide variety of platforms and devices. AMD, Intel, NVIDIA, SGI, Google and Oracle are just a few of the over 100 members. Today, OpenCL is maintained and further developed by Khronos. The OpenCL specification is now available in versions 1.1 and 1.2 (www.khronos.org/opencl/).

Standardisation

The goal of OpenCL is to provide a standardised programming interface for efficient and portable programs (Figure 1). Users can thus get what they have long been asking for; a vendor-independent, non-proprietary solution for accelerating their applications on the basis of their selected multi-core CPU, APU and GPU cores.

Figure 1: OpenCL is an open, royalty-free standard for programming of heterogeneous systems Source: Khronos Group

The OpenCL specification consists of the language specification as well as Application Programming Interfaces (APIs) for the platform layer and the runtime. The language specification describes the syntax and the programming interface for writing compute kernels, which can be executed on multi-core CPUs or GPUs. A compute kernel is the basic unit of executable code. The language used is based on a subset of ISO C99, which is a popular programming language among developers.

OpenCL’s platform model consists of a host, which establishes the connection to one or more OpenCL devices. Host and device are logically separated from each other and this preserves portability. The access to routines is obtained via the platform layer API, which queries the number and the types of devices existing in the system. The developer can select and initialise the desired compute devices in order to execute the tasks. Compute contexts as well as queues for job submission and data transfer requests are created in this layer. The runtime API offers the possibility to queue up compute kernels for execution. It is also responsible for managing the computing and memory resources in the OpenCL system.

Compute Kernels

The execution model describes the types of the compute kernels. Since OpenCL is designed for multi-core CPUs and GPUs, compute kernels can be created either as data-parallel, which fits well to the architecture of GPUs, or task-parallel, which matches better to the architecture of CPUs. When a kernel is submitted for execution on an OpenCL device by the host program, an index space is defined. An instance of the kernel executes for each point in this index space. Each element in the execution domain is a work-item, whereby OpenCL allows to group together work-items to form work-groups for synchronisation and communication purposes.

OpenCL defines a multi-level memory model consisting of four memory spaces: Private Memory (visible only to individual compute units of the device); Local Memory; Constant Memory, and; Global Memory, which can be used by all compute units in the device.

Depending on the actual memory subsystem, different memory spaces can be merged together. Figure 2 shows the memory hierarchy defined by OpenCL. The host processor is responsible for allocating and initialising the memory objects that reside in this memory space. The memory model is also based on the separation of host and device.

Figure 2: Overview of the memory hierarchy defined by OpenCL Source: AMD

Thanks to the hardware-independence and easy portability of OpenCL, companies can reuse their significant investment in source code, hence greatly reducing the development time for today’s complex image processing systems.

COM Support

Further optimisation of the design cycle is possible by making use of standard PC building blocks such as a high-performance processor module. Such a Computer-On-Module (COM) can be easily mounted onto a baseboard via a standardised connector, whereby the baseboard implements the application-specific functions. Computer-On-Modules are available in a range of different versions offering scalable processor power and a choice of interfaces. This module based technology thus provides a simple upgrade path for higher performance. Because the modules offered all meet defined standard specifications regarding form factor and connectivity, they are easily interchangeable with products from different vendors.

The MSC C6C-A7 module family supports OpenCL and is implemented using the well established COM Express form factor. With the new Type 6 pin-out, there are two significant improvements compared with the predecessor Type 2 pin-out: Type 6 pin-out can support up to three independent Digital Display Interfaces (DDIs) and also adds support for USB 3.0. This embedded platform in compact form factor (95x95mm) is based on AMD’s Embedded R-Series Accelerated Processing Units (APUs) and features very powerful graphics and excellent parallel computing performance with low power dissipation.

The MSC module also integrates the AMD R-460L 2.0GHz (2.8GHz Turbo) or AMD R-452L 1.6GHz (2.4GHz Turbo) quad-core processors. The thermal design power (TDP) levels are 25W and 19W, respectively. The two dual-core module versions can be populated with the AMD R-260H 2.1GHz (2.6GHz Turbo) processor or the AMD R-252F 1.7GHz (2.3GHz Turbo) processor — each featuring 17W TDP. All processors support the AMD64 technology and the AMD-V virtualisation technology. The AMD Fusion Controller Hub (FCH) A75 chipset is used in combination with all CPU versions. The main memory can be expanded to 16Gbyte DDR3-1600 dual-channel SDRAM via two SO DIMM sockets.

##IMAGE_1_L##
Figure 3: The MSC C6C-A7 module family is based on AMD’s Embedded R-Series Accelerated Processing Units (APUs) and supports OpenCL

The Radeon HD7000G-Series graphics engine integrated into the AMD R-Series APU, with its excellent graphics capabilities, offers support for OpenCL 1.1, OpenGL 4.2 and DirectX 11. The modules support up to four independent displays for imaging applications. HDMI, MPEG-2 decoding, H.264 and VCE (video compression engine) support is also included.

The MSC C6C-A7 COM Express module family offers six PCI Express x 1 channels and a PCI Express graphics (PEG) x 8 interface. In addition, all modules feature four USB 3.0 and four USB 2.0 ports, LPC, Gbit Ethernet, HD audio and four SATA interfaces at up to 300Mbyte/s. Featuring DisplayPort 1.2 and HDMI interfaces (3x digital display interface) supporting resolutions up to 4096 x 2160 (DP) and 1920 x 1200 (HDMI), along with LCD and VGA interfaces, the MSC C6C-A7 modules offer comprehensive display support.

The platform can run Microsoft Windows Embedded Standard 7 operating system, as well as Linux. The AMI based BIOS includes UEFI support. In addition to the Computer-On-Modules, MSC offers Starter Kits and suitable carrier boards, as well as cooling solutions and memory modules.

Thanks to the powerful computing and graphics capabilities, the platform is especially suited for demanding applications where 3D graphics, high-definition videos or the control of large displays are required. Typically such applications can be found in the fields of medical technology, infotainment, digital signage and gaming.