Design

Tensilica - New DSP Engine Combines Outstanding Performance, Compact Size, and Easy Programmability

26th August 2009
ES Admin
0
Tensilica has introduced the high-performance, small, low-power ConnX D2 16-bit dual-MAC (Multiply Accumulator) DSP (Digital Signal Processor) engine for its proven Xtensa LX dataplane processor cores for SOC (System-on-Chip) designs. The ConnX D2 DSP engine provides uncompromised performance from C code, unlike many other DSPs that require time consuming assembly coding for maximum performance. This means that virtually any C program, including those written with C intrinsic functions for the TI C6x family or ITU (International Telecommunications Union) reference code, can run unmodified and with excellent performance on the ConnX D2 DSP engine.
With the large ITU software code base available, the ConnX D2 DSP engine is ideal for telecom infrastructure and VoIP (Voice over Internet Protocol) applications. With its small size (less than 70,000 gates), the customizable ConnX D2 DSP engine is also ideal for a wide variety of low-power portable consumer applications including mobile wireless devices, next-generation disk drives and data storage, home entertainment devices, and computer peripherals.
The ConnX D2 DSP engine is an important step for Tensilica into the broader communications DSP market, stated Steve Roddy, Tensilica's vice president of marketing and business development. Our customizable dataplane processors have long been used as DSPs in many applications, including our market-leading 24-bit HiFi audio engine DSPs. Now, with the ConnX D2 DSP engine joining the other members of the ConnX DSP family, we have communications DSPs at all major performance points, all with the full benefit of our extensive optimizing compiler technology.

Tensilica's proven Xtensa C/C++ compiler (XCC) produces optimized instruction streams for the ConnX D2 DSP engine directly from C code. Many other DSPs require extensive assembly language programming for maximum performance. For example, the ConnX D2 DSP engine running compiled C code requires 20 percent fewer cycles for a 256-point complex FFT (Fast Fourier Transform) algorithm than the dual-MAC TI C55x DSP running hand optimized assembly code (C55x performance data taken from www.ti.com as of December 2008).
Tensilica also tested the performance of the AMR-NB (Adaptive Multi Rate compression, narrow band) encoder and decoder algorithm, which required just 28.5 MHz on the ConnX D2 DSP engine when compiled from the original ITU reference code. This is about twice the performance of competitive licensable DSP cores using the pure reference ITU code.
Directly compiling C-code without the need for extensive iterations at the assembly code level lets the ConnX D2 DSP offer designers a shorter development cycle, which gets new products to market faster. And it lets designers use the large existing library of proven code immediately on the ConnX D2 DSP engine.

The ConnX D2 DSP engine option adds dual 16-bit MAC units and an 8-entry, 40-bit register file to the base architecture of the Xtensa LX DPU (dataplane processing unit). The ConnX D2 DSP engine utilizes two-way SIMD (Single Instruction, Multiple Data) instructions to provide high-performance on vectorizable C code.
The ConnX D2 DSP engine is also implemented with an improved form of VLIW (Very Long Instruction Word) instructions that delivers parallel performance without the code size bloat associated with most VLIW DSPs. This allows for parallelization of code across the two MACs/ALUs when vectorization is not feasible. This choice of vectorization or parallelization is used extensively by the compiler for fast performance on any algorithm.

The ConnX D2 DSP engine supports a wide range of data types (e.g., 16-, 32-, and 40-bit integer and fixed point; 16-bit complex; 8- and 16-bit vector), seven addressing schemes, and data manipulation instructions including shifting, swapping, and logical operations to provide outstanding performance on DSP algorithms. For specific DSP algorithm acceleration, the ConnX D2 engine instructions include Add-Compare-Exchange (used with Viterbi), Add Modulo, Add Subtract, and Add Bit Reverse Base. Used in conjunction with a bit reversed addressing scheme, this instruction set delivers extremely efficient FFT implementations.
The ConnX D2 SIMD unit is supported by a comprehensive set of instructions for vector loads and stores that support multiple data widths and SIMD data register loading orders, which can be aligned or unaligned.

If designers have specific optimizations in mind that are not included in the ConnX D2 DSP engine and Xtensa LX instruction sets, they can easily add multi-cycle execution units, registers, register files, and more using the automated Tensilica Instruction Extension (TIE) methodology (details available at www.tensilica.com).

Every Xtensa LX DPU with (or without) the ConnX D2 DSP engine is automatically generated with a complete set of software development and modeling tools matched to the exact DPU configuration. Designers use Tensilica's Xtensa Xplorer Eclipse-based GUI (graphical user interface) as the cockpit for the entire design experience. From Xtensa Xplorer, designers can profile their application code and make the changes in the processor necessary to speed up that code. Designers can also pick from options for processor interfaces, memories, operating systems support, EDA scripts, debug and trace, and more.
Tensilica also provides a comprehensive collection of code generation and analysis tools that speed the software application development process.

When optimized for high frequency operation, an Xtensa processor with the ConnX D2 DSP engine delivers clock speeds up to 600 MHz in 65nm GP. When optimized for low-area in cost sensitive applications, a fully configured Xtensa LX with ConnX D2 engine can occupy as little as 0.18mm2 (fully routed) in 65GP process technologies.
The ConnX D2 DSP option is very power efficient. Core power consumption will of course vary with the SOC designer's choice of process technology and synthesis optimization targets. One example data point: a fully configured Xtensa LX core with the ConnX D2 DSP engine consumes only 52uW/MHz in 65 GP process technology (measured running an AMR-NB (VAD2) algorithm).

The ConnX D2 DSP option for the Xtensa LX processor will be available in October 2009.

Featured products

Product Spotlight

Upcoming Events

View all events
Newsletter
Latest global electronics news
© Copyright 2024 Electronic Specifier