Mobile 95 Views

by Do Trung Hai on 2017-03-24 12:46:16

Date: 2017. 03. 27(Mon) 04:00 P.M.

Locate: EB5. 533

Presenter: Trunghai Do

Title: CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms

Author: G. Hegde, Siddhartha, N. Ramasamy, N. Kapre

Abstract: Off-the-shelf accelerator-based embedded platforms offer a competitive energy-efficient solution for lightweight deep learning computations over CPU-based systems. Low-complexity classifiers used in power-constrained and performance-limited scenarios are characterized by operations on small image maps with 2- 3 deep layers and few class labels. For these use cases, we consider a range of embedded systems with 5-20W power budgets such as the Xilinx ZC706 board (with MXP soft vector processor), NVIDIA Jetson TX1 (GPU), TI Keystone II (DSP) as well as the Adapteva Parallella board (custom multi-core with NoC). Deep Learning computations push the capabilities of these platforms to the limit through compute-intensive evaluations of multiple 2D convolution filters per layer, and high communication requirements arising from the movement of intermediate maps across layers. We present CaffePresso, a Caffe-compatible framework for generating optimized mappings of user-supplied ConvNet specifications to target various accelerators such as FPGAs, DSPs, GPUs, RISC-multicores. We use an automated code generation and auto-tuning approach based on knowledge of the ConvNet requirements, as well as platform-specific constraints such as on-chip memory capacity, bandwidth and ALU potential. While one may expect the Jetson TX1 + cuDNN to deliver high performance for ConvNet configurations, (1) we observe a flipped result with slower GPU processing compared to most other systems for smaller embedded-friendly datasets such as MNIST and CIFAR10, and (2) faster and more energy efficient implementation on the older 28nm TI Keystone II DSP over the newer 20nm NVIDIA TX1 SoC in all cases.

Published in: 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES)

Paper link:

Article source: