Overview

High-end multi-core AI accelerator targeted for neural network inference workloads from 4 to 100s of TOPS

The Cadence® Tensilica® NNA multi-core accelerator platform is a configurable IP cluster comprising of multiple (1 to 8) NNA 110 single-core IP. The specialized hardware compute accelerators inside the NNA multi-core accelerator leverages features like true random sparsity, tensor compression / decompression to provide an overall high-end AI accelerator solution. A single NNA multi-core cluster can scale from 256 to 16K MACs for 8x8-bit computations. Multiple such clusters can be assembled to reach 100s of TOPS. NNA 110 multi-core accelerator deliverables comprises of turnkey soft RTL IP, multi-core workload mapper toolchain, and simulator for benchmarking. The NNA multi-core accelerator includes all the IP in one bundled package, which enables customers to achieve faster time to market.

 

 

hierarchical-nna-multi-core

Key Benefits

Scalable, configurable hardware and software turnkey solution

Fast Time to Market for customers that can leverage hardware IP, software toolchain and simulator environment

True sparse compute engine and Tensor compression

Exploits activation/weight random sparsity and lossless compression/de-compression logic

Configurable Internal SRAM Lowers Overall AXI Bandwidth Consumption with Boost in Performance

Minimize data movement across external system bus thus greatly reducing overall power

Best-in-Class Inference Latency and Throughput at Lowest Area and Power Footprint

Achieve linear scaling from single-core to multi-core across all KPIs for various workloads

End-to-End Turnkey XNNC Multi-Core Workload Mapper

Optimal workload partitioning to optimize over space and time axis, vectorization schemes, and resource utilization

Mixed-Precision Support in Hardware and Software

Supports 8-bit/16-bit quantized format with accuracy approaching floating-point model fidelity

Features

  • Single cluster scalable design ranging from 4 to 32 TOPS
  • Stacking multiple clusters to achieve 100s of TOPS
  • Supports various bandwidth configurations, AXI bus widths, and clock rates
  • AXI ports to communicate with an external host processor
  • Internal configurable SRAM ranging from 1 to 16MB
  • Built-in run-time sparsity-based cycle speedup
  • Built-in run-time tensor bandwidth compression/decompression logic
  • Small controller overhead using Tensilica CPU core to execute control code and management plane software
  • Includes internal synchronization mechanisms to coordinate across cores
  • Turnkey software multi-core mapper to achieve coarse-grained task-level parallelism
  • Supports neural networks trained in various frameworks such as Tensorflow, ONNX, PyTorch, Caffe2, TensorflowLite, etc.

Support

Cadence is committed to keeping design teams highly productive with a range of support offerings and processes designed to keep users focused on reducing time to market and achieving silicon success.

無料ソフトウェア評価

SDKソフトウェア開発ツールキットを15日間無料でお試しいただけます。EclipseベースのIDEがいかに簡単に使えるかを実感いただけます。

Apply Now

トレーニング

テンシリカのハンズオントレーニングは、テンシリカのツールの理解と製品の活用を飛躍的にスピードアップさせることが実証されています。

Browse Catalog

オンラインサポート

最新の記事や技術資料を掲載したナレッジベースに24時間いつでもアクセスできます。(ログインが必要です)

Access Now

Xtensa Processor Generator (XPG)

Xtensa Processor Generator (XPG)は、Xtensaのテクノロジーの中核となるものです。この特許取得済みのクラウドベースのシステムは、構築しながら正しいプロセッサが得られ、それに関連するすべてのソフトウェア、モデルなどすべてを自動的に作成します。(ログインが必要です)

Launch XPG