Mobile 204 Views

by Jinse Kwon on 2018-06-15 11:55:01

Date : 2018. 06. 20 (Wed) 13:00

Locate : EB5. 533

Presenter : Jinse Kwon


Title : S3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads

Author : Husheng Zhou,  Soroush Bateni,  Cong Liu

(The University of Texas at Dallas)


Abstract : Deep Neural Networks (DNNs) are being widely applied in many advanced embedded systems that require autonomous decision making, e.g., autonomous driving and robotics. To handle resource-demanding DNN workloads, graphic processing units (GPUs) have been used as the main acceleration engine. Although much research has been conducted to algorithmically optimize the efficiency of applying DNN to applications such as object recognition, limited attention has been given to optimizing the execution of GPU accelerated DNN workloads at the system level. In this paper, we propose S3DNN, a system solution that optimizes the execution of DNN workloads on GPU in a real-time multi-tasking environment, which simultaneously optimizes the two (sometimes) conflicting goals of real-time correctness and throughput. S3DNN contains a governor that selectively gathers system-wide DNN requests to perform smart data fusion, and a novel supervised streaming and scheduling framework that combines a deadline-aware scheduler with the concurrency enabled CUDA stream technique. To simultaneously maximize concurrency-induced benefits and real-time performance, S3DNN explores a rather interesting and unique characteristic of DNN workloads, where multiple layers of a DNN instance often exhibit a gradually decreased GPU resource utilization pattern. We have fully implemented S3DNN in a GPU-accelerated system and have conducted extensive sets of experiments evaluating the efficacy of S3DNN under a wide range of system and workload scenarios. The results show that S3DNN significantly improves upon state-of-the art GPU-accelerated DNN processing frameworks, e.g., up to 37% and over 40% improvements in real-time performance and throughput, respectively.


Paper : RTAS 2018, Best Paper, Outstanding Paper Awards

Article source: