Single-Core AI Acceleration Provides Best-in-Class Performance in Terms of FPS, FPS/mm2, and FPS/Watt

The Cadence® Tensilica® NNA 110 accelerator incorporates a custom hardware accelerator engine (NNE) coupled with a Tensilica Vision P6 or P1 DSP. The specialized compute block inside the NNA 110 hardware leverages features like random sparsity, tensor compression / decompression to provide an overall best in-class embedded AI accelerator solution.

A single-core NNA 110 accelerator supports 256 to 2K MAC 8x8-bit MAC computations and has various user-defined configurable options. The NNA 110 accelerator can run all neural network layers, including but not limited to convolution, fully connected, LSTM, LRN, and pooling operations. The accompanying Tensilica DSP in NNA 110 can run any operation that is not native to the accelerator, thereby making NNA 110 a highly flexible and robust future-proof offering. NNA 110 solution deliverables comprises of turnkey soft RTL IP, software compiler toolchain, and an accurate simulator for benchmarking.

nna-110-hardware

Key Benefits

Scalable, Configurable Hardware Turnkey Solution

Flexibility in targeting varying use cases ranging from 0.5 to 4 TOPS

Turnkey End-to-End GLOW-Based Xtensa Neural Network Compiler (XNNC) Toolchain

Works with various model formats ranging from Tensorflow, ONNX, PyTorch, Caffe2, TensorflowLite etc.

Mixed-Precision Support in hardware and software

Supports 8-bit/16-bit quantized format with accuracy approaching Floating point model fidelity

True Sparse Compute Engine and Tensor Compression

Exploits activation/weight random sparsity and lossless compression/decompression logic

Achieves Best-in-Class KPIs in Terms of TOPS, TOPS/Watt, and TOPS/mm2

Extracts best MAC utilization for high throughput, low latency, low bandwidth, and low energy consumption workloads

Features

  • Supports scalable NNE MAC configurations: 256, 512, 1024, and 2048 8-bit MACs (# of 16-bit MACs = 1/4th of # 8-bit MACs)
  • Supports UBUF configurations: 256KB to 2MB
  • Supports various bandwidth configurations: 32/16/8/4 bytes/clock and AXI bus width of 128 or 256 bits
  • Supports clock rates up to 1GHz
  • Run-time sparsity-based cycle speedup
  • 4-bit weight clustering
  • Runtime tensor bandwidth compression/decompression
  • Asymmetric quantization support

Support

Cadence is committed to keeping design teams highly productive with a range of support offerings and processes designed to keep users focused on reducing time to market and achieving silicon success.

Free Software Evaluation

Try our SDK Software Development Toolkit for 15 days absolutely free. We want to show you how easy it is to use our Eclipse-based IDE.

Apply Now

Training

Our hands-on training has been demonstrated to dramatically speed up the understanding of Tensilica tools and best use of the products.

Browse Catalog

Online Support

Get 24x7 online access to a knowledgebase of the latest articles and technical documentation. (Login Required)

Access Now

Xtensa Processor Generator (XPG)

The Xtensa Processor Generator (XPG) is the heart of our technology - the patented cloud-based system that creates your correct-by-construction processor and all associated software, models, etc. (Login Required)

Launch XPG