DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

by Byungkyo Jung on 2018-11-27 20:17:14

Date : 2018. 12. 4 (Tue) 10:00 Locate : EB5, 533 Presenter : Byeongkyo Cheong   Title :  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications Author : Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev (Bell Labs, University of Cambridge, University of Bologna)   Abstract : Breakthroughs from the field of deep learning are radically changing how sensor data are interpreted to extract the high-level information needed by mobile apps. It is critical that the gains in inference accuracy that deep models afford become embedded in future generations of mobile apps. In this work, we present the design and implementation of DeepX, a software accelerator for deep learning execution. DeepX significantly lowers the device resources (viz. memory, computation, energy) required by deep learning that currently act as a severe bottleneck to mobile adoption. The foundation of DeepX is a pair of resource control algorithms, designed ... Continue reading →

25 Views

DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications

by Jinse Kwon on 2018-11-23 18:13:31

Date : 2018. 11. 23 (Tue) 10:00 Locate : EB5. 533 Presenter : Jinse Kwon   Title : DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications Author : Loc N. Huynh, Youngki Lee, Rajesh Krishna Balan (Singapore Management University )   Abstract : The rapid emergence of head-mounted devices such as the Microsoft Holo-lens enables a wide variety of continuous vision applications. Such applications often adopt deep-learning algorithms such as CNN and RNN to extract rich contextual information from the first-person-view video streams. Despite the high accuracy, use of deep learning algorithms in mobile devices raises critical challenges, i.e., high processing latency and power consumption. In this paper, we propose DeepMon, a mobile deep learning inference system to run a variety of deep learning inferences purely on a mobile device in a fast and energy-efficient manner. For this, we designed a suite of ... Continue reading →

30 Views

실시간 운영체제의 태스크 설정에 따른 스케줄링 성능 분석

by Hyeoksoo Jang on 2018-10-25 17:56:38

Date : 2018.11.06 (Tue) 10:00 Locate : EB5. 533 Presenter : Hyeoksoo Jang Title : 실시간 운영체제의 태스크 설정에 따른 스케줄링 성능 분석 Abstract :  Continue reading →

39 Views

임베디드 디바이스에서 OpenCL을 이용한 흑백 이미지 색생화 가속화

by Donghee Ha on 2018-10-25 17:53:49

Date : 2018.10.23 Locate : EB5. 533 Presenter : Donghee Ha Title : 임베디드 디바이스에서 OpenCL을 이용한 흑백 이미지 색생화 가속화 Abstract :      Continue reading →

38 Views

The design and implementation of microdrivers

by Sihyeong Park on 2018-08-27 10:30:57

Date : 2018. 09. 13  (Thu) 14:00 Locate : EB5. 533 Presenter : Sihyeong Park   Title : The design and implementation of microdrivers ASPLOS XIII Proceedings of the 13th international conference on Architectural support for programming languages and operating systems Author :   Vinod Ganapathy Rutgers University, Piscataway, NJ Matthew J. Renzelmann University of Wisconsin-Madison, Madison, WI Arini Balakrishnan Sun Microsystems, Santa Clara, CA Michael M. Swift University of Wisconsin-Madison, Madison, WI Somesh Jha University of Wisconsin-Madison, Madison, WI Abstract :  Device drivers commonly execute in the kernel to achieve high performance and easy access to kernel services. However, this comes at the price of decreased reliability and increased programming difficulty. Driver programmers are unable to use user-mode development tools and must instead use cumbersome kernel tools. Faults in kernel drivers can cause the entire ... Continue reading →

202 Views

Situating Wearables: Smartwatch Use in Context

by Jinyoung Choi on 2018-08-06 11:00:58

Date : 2018. 08. 08 (Wed) 13:00 Locate : EB5. 533 Presenter : Jinyoung Choi   Title : Situating Wearables: Smartwatch Use in Context Author : Donald McMillan, Barry Brown, Airi Lampinen, Moira McGregor, Eve Hogga, Stefania Pizza (The University of Stockholm at Kista, Sweden)   Abstract : Drawing on 168 hours of video recordings of smartwatch use, this paper studies how context influences smartwatch use. We explore the effects of the presence of others, activity, location and time of day on 1,009 instances of use. Watch interaction is significantly shorter when in conversation than when alone. Activity also influences watch use with significantly longer use while eating than when socialising or performing domestic tasks. One surprising finding is that length of use is similar at home and work. We note that usage peaks around lunchtime, with an average of 5.3 watch uses per hour throughout a day. We supplement these findings with ... Continue reading →

264 Views

CLBlast: A Tunes OpenCL BLAS Library

by Byungkyo Jung on 2018-07-20 17:28:45

Date : 2018.7.25 Locate : EB5. 533 Presenter : Byeongkyo Cheong Author : Cedric Nugteren Title : CLBlast: A Tuned OpenCL BLAS Library Abstract : This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysics, computational fluid dynamics, quantum chemistry). CLBlast has five main advantages over other OpenCL BLAS libraries: 1) it is optimized for and tested on a large variety of OpenCL devices including less commonly used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned for specific problem-sizes on specific hardware platforms, 3) it can perform operations in half-precision floating-point FP16 saving bandwidth, time and energy, ... Continue reading →

226 Views

네트워크 성능향상을 위한 시스템 호출 수준 코어 친화도

by Daeyoung Song on 2018-07-16 10:27:16

Data : 2018.7.18 (Wed) 13:00 Locate : EB5. 533 Presenter : Daeyoung Song Title : 네트워크 성능향상을 위한 시스템 호출 수준 코어 친화도 Author : 엄준용, 조중연, 진현욱 Abstract : Existing operating systems experience scalability issues as the number of cores increases. The network I/O performance on manycore systems is faced with the major limiting factors of cache consistency costs and locking overheads. Legacy methods resolve this issue include the new microkernel-like operating system or modification of existing kernels; however, these solutions are not fully application transparent. In this study, we proposed a library that improves the network performance by separating system call context from user context and by applying the core affinity without any kernel and application modifications. Experiment results showed that our implementation can improve the network throughput of Apache by up to 30%. Continue reading →

111 Views

멀티코어 환경에서 비실시간 메시지의 응답시간 지연을 최소화하는 리눅스 기반 메시지 처리기의 설계 및 구현

by Hyeoksoo Jang on 2018-06-27 15:08:03

Date : 2018. 07. 011 (Wed) 13:00 Locate : EB5. 533 Presenter : Hyeoksoo Jang Title : 멀티코어 환경에서 비실시간 메시지의 응답시간 지연을 최소화하는 리눅스 기반 메시지 처리기의 설계 및 구현 Author : 왕상호, 박영훈, 박성용, 김승춘, 김철회, 김상준, 진 철 Link : http://www.dbpia.co.kr/Journal/PDFViewNew?d=NODE07111226&prevPathCode= Continue reading →

168 Views

S3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads

by Jinse Kwon on 2018-06-15 11:55:01

Date : 2018. 06. 20 (Wed) 13:00 Locate : EB5. 533 Presenter : Jinse Kwon   Title : S3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads Author : Husheng Zhou,  Soroush Bateni,  Cong Liu (The University of Texas at Dallas)   Abstract : Deep Neural Networks (DNNs) are being widely applied in many advanced embedded systems that require autonomous decision making, e.g., autonomous driving and robotics. To handle resource-demanding DNN workloads, graphic processing units (GPUs) have been used as the main acceleration engine. Although much research has been conducted to algorithmically optimize the efficiency of applying DNN to applications such as object recognition, limited attention has been given to optimizing the execution of GPU accelerated DNN workloads at the system level. In this paper, we propose S3DNN, a system solution that optimizes the execution of DNN workloads on GPU in a ... Continue reading →

142 Views

Rehearsal: Reducing Convolutional Neural Network for Object Detection on Embedded Devices

by Do Trung Hai on 2018-05-29 15:09:39

Date : 2018. 05. 30 (Wed) 13:00 Locate : EB5. 533 Presenter : Trunghai Do Abstract :  Nowadays, convolutional neural networks (CNNs) become the center of many computer vision solutions to solve a variety of tasks. However, memory and computation are two of the most important characteristics of deep neural networks. These characteristics make neural networks difficult to effectively deploy on limited hardware resources such as embedded systems. Furthermore, to deploy models for devices and update them regularly, the model size needs to be small. In this thesis, we propose DroidDet a small fully convolutional neural network. Our DroidDet adopts You Only Look Once (YOLO) object detection algorithm for the ARM Mali-T628 MP6 GPU of ODROID-XU4. In order to build DroidDet, we not only replace all the fully connected layers that act as detection layers in YOLO with convolutional layers but also rearrange some of the convolutional layers to reduce the model size and ... Continue reading →

148 Views

매니코어 환경에서 리눅스 커널의 공유 메모리 관리에 대한 문제점 분석

by Byungkyo Jung on 2018-05-17 20:49:05

Data : 2018.5.23 (Wed) 13:00 Locate : EB5. 533 Presenter : Byeongkyo Cheong Title : 매니코어 환경에서 리눅스 커널의 공유 메모리 관리에 대한 문제점 분석 Author : 서동주, 경주현, 임성수 Continue reading →

120 Views

HeartChat: Heart Rate Augmented Mobile Messaging to Support Empathy and Awareness

by Jinyoung Choi on 2018-05-11 16:24:57

Date : 2018. 5. 16 (Wed) 13:00 Locate : EB5. 533 Presenter : Jinyoung Choi   Title : HeartChat: Heart Rate Augmented Mobile Messaging to Support Empathy and Awareness Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems May 06-11, 2017 ISBN : 978-1-4503-4655-9 doi : 10.1145/3025453.3025758   Author : Mariam Hassib, Daniel Buschek, Paweł W. Wo´zniak, Florian Alt, LMU Munich-Ubiquitous Interactive Systems Group, University of Stuttgart - VIS, Stuttgart, Germany Abstract : Textual communication via mobile phones suffers from a lack of context and emotional awareness. We present a mobile chat application, HeartChat, which integrates heart rate as a cue to increase awareness and empathy. Through a literature review and a focus group, we identified design dimensions important for heart rate augmented chats. We created three concepts showing heart rate per message, in real-time, or sending it explicitly. ... Continue reading →

153 Views

Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference ...

by Jinse Kwon on 2018-03-30 16:04:38

Date : 2018. 04. 04 (Wed) 13:00 Locate : EB5. 533 Presenter : Jinse Kwon   Title : Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables Author : Sourav Bhattacharya, Nicholas D. Lane (Nokia Bell Labs and University College London)   Abstract : Deep learning has revolutionized the way sensor data are analyzed and interpreted. The accuracy gains these approaches offer make them attractive for the next generation of mobile, wearable and embedded sensory applications. However, state-of-the-art deep learning algorithms typically require a significant amount of device and processor resources, even just for the inference stages that are used to discriminate high-level classes from low-level data. The limited availability of memory, computation, and energy on mobile and embedded platforms thus pose a significant challenge to the adoption of these powerful learning techniques. In this paper, we ... Continue reading →

241 Views

리눅스 기반 모바일 기기에서 사용자 응답성 향상을 위한 프레임워크 지원 선별적 페이지 보호 기법

by Byungkyo Jung on 2018-03-14 15:48:47

Data : 2018.3.21 (Wed) 13:00 Locate : EB5. 533 Presenter : Byeongkyo Cheong Author : 김승준, 김정호, 홍성수 Abstract : While Linux-based mobile devices such as smartphones are increasingly used, they often exhibit poor response time. One of the factors that influence the user-perceived interactivity is the high page fault rate of interactive tasks. Pages owned by interactive tasks can be removed from the main memory due to the memory contention between interactive and background tasks. Since this increases the page fault rate of the interactive tasks, their executions tend to suffer from increased delays. This paper proposes a framework-assisted selective page protection mechanism for improving interactivity of Linux-based mobile devices. The framework-assisted selective page protection enables the run-time system to identify interactive tasks at the framework level and to deliver their IDs to the kernel. As a result, the kernel can maintain the pages owned by the identified ... Continue reading →

183 Views