Overview

High-end multi-core AI accelerator targeted for neural network inference workloads from 4 to 100s of TOPS

The Cadence® Tensilica® NNA multi-core accelerator platform is a configurable IP cluster comprising of multiple (1 to 8) NNA 110 single-core IP. The specialized hardware compute accelerators inside the NNA multi-core accelerator leverages features like true random sparsity, tensor compression / decompression to provide an overall high-end AI accelerator solution. A single NNA multi-core cluster can scale from 256 to 16K MACs for 8x8-bit computations. Multiple such clusters can be assembled to reach 100s of TOPS. NNA 110 multi-core accelerator deliverables comprises of turnkey soft RTL IP, multi-core workload mapper toolchain, and simulator for benchmarking. The NNA multi-core accelerator includes all the IP in one bundled package, which enables customers to achieve faster time to market.

 

 

hierarchical-nna-multi-core

Key Benefits

Scalable, configurable hardware and software turnkey solution

Fast Time to Market for customers that can leverage hardware IP, software toolchain and simulator environment

True sparse compute engine and Tensor compression

Exploits activation/weight random sparsity and lossless compression/de-compression logic

Configurable Internal SRAM Lowers Overall AXI Bandwidth Consumption with Boost in Performance

Minimize data movement across external system bus thus greatly reducing overall power

Best-in-Class Inference Latency and Throughput at Lowest Area and Power Footprint

Achieve linear scaling from single-core to multi-core across all KPIs for various workloads

End-to-End Turnkey XNNC Multi-Core Workload Mapper

Optimal workload partitioning to optimize over space and time axis, vectorization schemes, and resource utilization

Mixed-Precision Support in Hardware and Software

Supports 8-bit/16-bit quantized format with accuracy approaching floating-point model fidelity

Features

  • Single cluster scalable design ranging from 4 to 32 TOPS
  • Stacking multiple clusters to achieve 100s of TOPS
  • Supports various bandwidth configurations, AXI bus widths, and clock rates
  • AXI ports to communicate with an external host processor
  • Internal configurable SRAM ranging from 1 to 16MB
  • Built-in run-time sparsity-based cycle speedup
  • Built-in run-time tensor bandwidth compression/decompression logic
  • Small controller overhead using Tensilica CPU core to execute control code and management plane software
  • Includes internal synchronization mechanisms to coordinate across cores
  • Turnkey software multi-core mapper to achieve coarse-grained task-level parallelism
  • Supports neural networks trained in various frameworks such as Tensorflow, ONNX, PyTorch, Caffe2, TensorflowLite, etc.

Support

Cadence is committed to keeping design teams highly productive with a range of support offerings and processes designed to keep users focused on reducing time to market and achieving silicon success.

Free Software Evaluation

Try our SDK Software Development Toolkit for 15 days absolutely free. We want to show you how easy it is to use our Eclipse-based IDE.

Apply Now

Training

Our hands-on training has been demonstrated to dramatically speed up the understanding of Tensilica tools and best use of the products.

Browse Catalog

Online Support

Get 24x7 online access to a knowledgebase of the latest articles and technical documentation. (Login Required)

Access Now

Xtensa Processor Generator (XPG)

The Xtensa Processor Generator (XPG) is the heart of our technology - the patented cloud-based system that creates your correct-by-construction processor and all associated software, models, etc. (Login Required)

Launch XPG