Getting to know Arm architecture
The Arm architecture now dominates the embedded processing and computing space. It has come a long way over the last few decades – starting out in the 1980s as a processor for home computers, then becoming the foundation for cellular handsets in the 1990s. Mark Patrick, Mouser Electronics, explains.
Today, there are few tech market segments where Arm is not a serious contender. In many, it has established itself as the number one choice for 32-bit or 64-bit processing. Thanks to this proliferation, there are now thousands of variants based on the Arm architecture. Understanding how these cores differ from one another is an important part of the selection process.
It was the creation, back in 2004, of the initial Cortex families that saw the Arm architecture diverge into three core product groups, each directed at a different type of application. The first to move into silicon was the Cortex-M, which has gone on to be the mainstay of the Arm-based microcontroller (MCU) ecosystem. Although the Cortex-M family debuted with cores based on a version 7 architecture, later additions aimed at ultra-low-cost devices (namely the M0, M0+ and M1) were based on the earlier version 6 architecture. All of the Cortex-M processors execute just the Thumb instruction set. The two other families were designed to support both the Thumb and full A32 instruction sets.
Since its launch, the Cortex-M3 has seen uptake by many MCU vendors, helping them to define their 32-bit product offerings. Examples now available include relatively simple but highly cost-effective MCUs, such as the Silicon Labs EFM Tiny Gecko, which is aimed at low-power systems, and the PSoC5 system-on-chip from Cypress Semiconductor, which combines conventional MCU peripherals with highly flexible, programmable analogue functions.
Above: Figure 1. Silicon Labs’ EFM Tiny Gecko
As MCU applications started to demand greater performance for digital signal processing (DSP), Arm responded with the Cortex-M4. This provided the option of floating-point support, which many suppliers wholeheartedly embraced. A common configuration is to combine the powerful Cortex-M4F core with the simpler Cortex-M0 or Cortex-M0+ – which presents users with highly effective power management and efficient resource allocation.
In a device such as a Cypress PSoC6 or the NXP LPC5411x, the M0+ core can handle interrupts, leaving the M4 or M4F free to handle DSP tasks without interruption, which maximises throughput. This division of responsibility also enables the more powerful M4 core to sleep for longer intervals of time between bursts of activity. The lower power M0+ can take care of comparatively simple system management tasks during periods of relatively limited operation.
In 2014, Arm pushed Cortex-M performance farther with the launch of the M7. This core sports a six-stage superscalar pipeline with support for out-of-order completion and is augmented by a full floating-point unit. The STM32F730x8 manufactured by STMicroelectronics combines the M7 core with a wide variety of peripherals and the company’s proprietary ART accelerator technology (which enables zero-wait-state execution out of Flash).
Cortex-A
In 2005, Arm launched the first member of the Cortex-A family, one that recognised the changing nature of the cellular handset business as it moved toward smartphones and tablets. The Cortex-A was designed to provide a set of features tailored to application processors. It also paved the way for Arm core deployment in servers and other high-end computing systems.
A major difference between Cortex-A processors and those from other families is support for a paged memory management unit (MMU). An MMU is required for Linux and similar operating systems, as it provides the ability to map programs and their data in real memory into different virtual address spaces. This provides a degree of protection against data owned by different tasks from being corrupted by a neighbour, in addition to making it possible to treat physical memory as a large cache. It also avoids problems caused by memory fragmentation – as programs are loaded and unloaded dynamically.
A potential drawback with the use of paged virtual addresses is that they interfere with real-time operation, so the MMU is found in Cortex-A processors but not in the families with a stronger embedded systems focus. A key innovation of the Cortex-A architecture from the point of its inception was TrustZone. This implements a hardware enforced layer of security that makes it possible for a hypervisor to deny access to parts of the processor and memory to any tasks without the required security credentials. TrustZone puts cryptographic and other sensitive operations into a virtual processor protected by a hardware firewall.
Above: Figure 2. The PSoC6 from Cypress
In terms of cores, the range extends from the comparatively simple Cortex-A5 through to high-performance superscalar processors, such as the Cortex-A72. This combines the ability to issue three instructions simultaneously with out-of-order execution that streamlines the scheduling for maximum efficiency.
The second major innovation of the Cortex-A family, introduced in 2011, was the big.LITTLE framework. This mirrors the coupling of different Cortex-M cores that accompanied the introduction of the M4 but in the application processor space, and with additional enhancements that support the needs of applications processors.
With big.LITTLE, Arm took the approach of combining low-end cores (such as the A5 or A7) with higher performance, often superscalar implementations. Wherever possible, the operating system attempts to keep the low energy processor active alone for as long as possible and then activate the higher power core when the workload passes a certain threshold. In contrast to conventional dual-core architectures, the tasks can migrate from one processor to another depending on system conditions. As the demand for performance has increased, a growing number of Cortex-A implementations revolve around the use of four high-end cores in a processor complex. This arrangement saves power by shutting one or more down during lulls in demand.
Cortex-R
The third major Arm family, the Cortex-R, provided a route to supporting a new generation of complex automotive and cyber-physical systems through the employment of real-time and highly reliable features. The need for deterministic performance in target applications means the caches that are often used to speed up processing in other Arm processors are not always advisable. Because a cache dynamically replaces instructions and data values with more recently used entries, there is the possibility that critical information may not be in the cache when needed by an interrupt-service routine or real-time task. The Cortex-R family overcomes this problem with its support for tightly coupled memory (TCM) banks. Critical information can thus be stored in them during operation and, being software-managed, the risk of instructions and data getting replaced by a cache management subsystem is avoided.
Since the introduction of the original Cortex-R4, the family has evolved. The Cortex-R5 and R7 cores were provided with low-latency peripheral ports. Most cores are designed to work with an on-chip bus, such as the Arm hardware bus (AHB) or, in more recent cores, the advanced eXtensible interface (AXI) infrastructure. The low-latency port connects the core directly to important peripherals, giving access without having to arbitrate for the bus or wait for other bus access activities to be completed.
To support highly reliable operation, the caches, TCMs and system buses on Cortex-R products can use error correction coding to transparently correct single-bit errors and detect double-bit errors. As modular redundancy is a core part of safety-critical systems, Cortex-R family cores are designed to be able to work in lock-step with copies. If an on-chip monitor detects a difference in output, it can warn of a problem so that software can take corrective action. One example of the Cortex-R family in production is the Cypress Traveo S6J33xx series of MCUs. This couples the
Cortex-R5F core running at up to 240MHz with peripherals optimised for driving instrument clusters in automotive dashboards.
Arm v8
A second wave of changes to the Arm core offering came in 2011 with the creation of the version 8 architecture, which added the ability for applications to run in 64-bit mode, greatly expanding the maximum addressable memory space for applications processors. Arm v8 processors with 64-bit capability have the ability to run in either 32-bit or 64-bit modes. The former provides backward compatibility with applications written for version 7 processors. Because of its focus on MCU applications, the version 8 processors in the Cortex-M family do not support 64-bit addressing.
However, they do add a number of additional instructions and features that improve performance and enhance secure operation.
One significant advance was a reworked memory protection unit (MPU) that allows more flexible management of regions. Another was full support for execute-only-memory – to help prevent reverse engineering and hacking attempts. However, the biggest change in terms of security came in the form of the support for a version of the TrustZone mechanism specifically optimised for deeply embedded processors.
Above: Figure 3. An example of a SAML11 MCU from Microchip
In the Cortex-M version of TrustZone, there is no requirement for a software hypervisor to manage transitions between secure and non-secure states. Instead, specialised instructions are utilised to pass data from non-secure tasks to secure functions that are allowed to operate in privileged mode. Even high priority interrupts cannot see secure data in registers if they do not have the correct privileges. The security functions allow for the creation of well-protected IoT devices and are found in MCUs based on cores such as the Cortex-M23 and Cortex-M33.
The Microchip SAML11 MCU uses the Cortex-M23 augmented with on-chip crypto controllers to provide hardware security for sensor nodes and similar designs. For its nRF9160, Nordic Semiconductor employs the Cortex-M33 to handle processing for devices that need secure RF communications.
Conclusion
Without question, Arm has been one of the greatest success stories for the global electronics business. The vast portfolio it offers continues to evolve in multiple directions, in order to cater for the needs of many different markets. The subdivision into product families, such as the Cortex-A, Cortex-M or Cortex-R, has proved fundamental to this growth, and will continue to drive Arm core uptake in new areas as they emerge.