Single-Core AI Acceleration Provides Best-in-Class Performance in Terms of FPS, FPS/mm2, and FPS/Watt
The Cadence® Tensilica® NNA 110 accelerator incorporates a custom hardware accelerator engine (NNE) coupled with a Tensilica Vision P6 or P1 DSP. The specialized compute block inside the NNA 110 hardware leverages features like random sparsity, tensor compression / decompression to provide an overall best in-class embedded AI accelerator solution.
A single-core NNA 110 accelerator supports 256 to 2K MAC 8x8-bit MAC computations and has various user-defined configurable options. The NNA 110 accelerator can run all neural network layers, including but not limited to convolution, fully connected, LSTM, LRN, and pooling operations. The accompanying Tensilica DSP in NNA 110 can run any operation that is not native to the accelerator, thereby making NNA 110 a highly flexible and robust future-proof offering. NNA 110 solution deliverables comprises of turnkey soft RTL IP, software compiler toolchain, and an accurate simulator for benchmarking.
Scalable, Configurable Hardware Turnkey Solution
Flexibility in targeting varying use cases ranging from 0.5 to 4 TOPS
Turnkey End-to-End GLOW-Based Xtensa Neural Network Compiler (XNNC) Toolchain
Works with various model formats ranging from Tensorflow, ONNX, PyTorch, Caffe2, TensorflowLite etc.
Mixed-Precision Support in hardware and software
Supports 8-bit/16-bit quantized format with accuracy approaching Floating point model fidelity
True Sparse Compute Engine and Tensor Compression
Exploits activation/weight random sparsity and lossless compression/decompression logic
Achieves Best-in-Class KPIs in Terms of TOPS, TOPS/Watt, and TOPS/mm2
Extracts best MAC utilization for high throughput, low latency, low bandwidth, and low energy consumption workloads
Cadence is committed to keeping design teams highly productive with a range of support offerings and processes designed to keep users focused on reducing time to market and achieving silicon success.