Tensilica has extended its Xtensa family of IP cores for compute-intensive dataplane and DSP functions such as imaging, video, networking and baseband communication. The new Xtensa LX4 dataplane processor (DPU) for SoCs supports local data memory bandwidth up to 1024 bits per cycle, VLIW (very long instruction word) instructions up to 128 bits, and a cache memory prefetch option that boosts overall system performance with external memory.

 

The Xtensa LX4 DPU can process two 512-bit load/store operations per cycle. This makes Xtensa LX4 DPUs ideal for baseband processing, video pre- and post-processing, image signal processing, and various network packet processing functions. This enhanced local memory bandwidth complements Tensilica's existing local port and queue interfaces, which provide unlimited bandwidth for point-to-point data and control signals.

 

The Xtensa LX4 supports VLIW instructions up to 128 bits wide to boost the number of independent operations per clock cycle. All wide instructions are seamlessly intermixed with shorter base Xtensa instructions, so there is no mode switch penalty. Tensilica's Xtensa C/C++ compiler automatically extracts parallelism from source code and bundles multiple operations into single wide instructions. An Xtensa LX4 DPU with wide instructions running parallel operations at low clock frequency can often deliver performance matching that of larger, higher frequency non-VLIW cores while consuming far less power for the same task.

 

The new data prefetch option reduces cycle counts in long-latency designs by fetching data from system memory before use, so the data is ready and waiting when the application code needs it. This is especially beneficial when streaming data from contiguous memory locations, and it is much simpler than adding a separate DMA engine, which requires additional programming and application code tuning.

 

Image: Tensilica