Split-CNN: Splitting Window-based Operations in CNN for Memory System Optimization

by Donghee Ha on 2020-01-23 14:43:44

Date: 2020. 01. 20 (Mon) 15:00 Locate: EB5. 533 Presenter: Donghee Ha Title: Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization Author: Tian Jin, Seokin Hong Abstract: We present an interdisciplinary study to tackle the memory bottleneck of training deep convolutional neural networks (CNN). Firstly, we introduce Split Convolutional Neural Network (Split-CNN) that is derived from the automatic transformation of the state-of-the-art CNN models. The main distinction between Split-CNN and regular CNN is that Split-CNN splits the input images into small patches and operates on these patches independently before entering later stages of the CNN model. Secondly, we propose a novel heterogeneous memory management system (HMMS) to utilize the memory-friendly properties of Split-CNN. Through experiments, we demonstrate that Split-CNN achieves significantly higher training scalability by dramatically reducing the memory ... Continue reading →

49 Views

Hips Do Lie! A Position-Aware Mobile Fall Detection System

by Jinyoung Choi on 2020-01-19 18:22:58

Date: 2020. 01. 20 (Mon) 15:00 Locate: EB5. 533 Presenter: Jinyoung Choi Title: Hips Do Lie! A Position-Aware Mobile Fall Detection System 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom) Author: Christian Krupitzer, Timo Sztyler, Janick Edinger, Martin Breitvach, Heiner Stuckenschmidt, Christian Becker  Abstract: Ambient Assisted Living using mobile device sensors is an active area of research in pervasive computing. Multiple approaches have shown that wearable sensors perform very well and distinguish falls reliably from Activities of Daily Living. However, these systems are tested in a controlled environment and are optimized for a given set of sensor types, sensor positions, and subjects. In this work, we propose a self-adaptive pervasive fall detection approach that is robust to the heterogeneity of real life situations. Therefore, we combine sensor data of four publicly available datasets, covering about 100 subjects, ... Continue reading →

43 Views

Cache Locking Content Selection Algorithms for ARINC-653 Compliant RTOS

by Sihyeong Park on 2020-01-06 17:02:55

Date: 2020. 01. 13 (Mon) 15:00 Locate: EB5. 533 Presenter: Sihyeong Park Title: Cache Locking Content Selection Algorithms for ARINC-653 Compliant RTOS ACM Transactions on Embedded Computing Systems (TECS) 18.5s (2019): 76. Author: Alexy Torres Aurora Dugo, Jean Baptiste Lefoul, Felipe Göhring De Magalhães, Dahman Assal, Gabriela  Nicolescu  Abstract: Avionic software is the subject of stringent real time, determinism and safety constraints. Software designers face several challenges, one of them being the interferences that appear in common situations, such as resource sharing. The interferences introduce non-determinism and delays in execution time. One of the main interference prone resources are cache memories. In single-core processors, caches comprise multiple private levels. This breaks the isolation principle imposed by avionic standards, such as the ARINC-653. This standard defines partitioned architectures where one partition should ... Continue reading →

75 Views

MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution...

by Jinse Kwon on 2020-01-02 18:04:34

Date : 2020. 01. 06 (Mon) 15:00 Locate : EB5. 533 Presenter : Jinse Kwon   Title : MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference Author : Myeonggyun Han ; Jihoon Hyun ; Seongbeom Park ; Jinsu Park ; Woongki Baek (UNIST, Republic of Korea)   Abstract : Heterogeneous embedded systems have surfaced as a promising solution for accurate and efficient deep-learning inference on mobile devices. Despite extensive prior works, it still remains unexplored to investigate the system-software support that efficiently executes inference workloads by judiciously considering their performance and energy heterogeneity, communication overheads, and constraints. To bridge this gap, we propose MOSAIC, heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference on heterogeneous embedded systems. MOSAIC generates the ... Continue reading →

80 Views

Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training

by Donghee Ha on 2019-11-06 17:14:20

Date: 2019. 11. 06 (Thu) 18:00 Locate: EB5. 533 Presenter: Donghee Ha Title: Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems Author: Wei-Chen Wang, Yuan-Hao Chang, Tei-Wei Kuo, Chien-Chung Ho, Yu-Ming Chang and Hung-Sheng Chang   Abstract: Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. While non-volatile memory offers potential solutions to data volume issues, challenges must be faced over performance issues, especially with asymmetric read and write performance. Beside that, critical concerns over endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This work addresses the performance and endurance concerns altogether by proposing a data-aware programming scheme. We propose to consider neural network training jointly with respect to the data-flow and data-content points of view. ... Continue reading →

384 Views

전달세미나 : ESWEEK 2019

by Jinse Kwon on 2019-10-24 13:43:19

Date : 2019. 10. 24 (Wed) 19:30 Locate : EB5. 507 Presenter : Hyungshin Kim, Jinse Kwon Title : Message for ESWEEK 2019 Web : ESWEEK 2019 Continue reading →

76 Views

Lab Seminar : Oct. 24 7:30 2019 ESWEEK Review

by Hyungshin Kim on 2019-10-23 21:27:45

Hyungshin Kim and Jinse Kwon will review this year's ESWEEK. CASES, CODES+ISSS, EMSOFT will be reviewed. Continue reading →

125 Views

StreamBox-TZ: Secure Stream Analytics at the Edge with TrustZone

by Sihyeong Park on 2019-10-08 16:13:29

Date: 2019. 10. 10 (Thu) 19:30 Locate: EB5. 507 Presenter: Sihyeong Park Title: StreamBox-TZ: Secure Stream Analytics at the Edge with TrustZone Author: Heejin Park and Shuang Zhai, Purdue ECE; Long Lu, Northeastern University; Felix Xiaozhu Lin, Purdue ECE Abstract: While it is compelling to process large streams of IoT data on the cloud edge, doing so exposes the data to a sophisticated, vulnerable software stack on the edge and hence security threats. To this end, we advocate isolating the data and its computations in a trusted execution environment (TEE) on the edge, shielding them from the remaining edge software stack which we deem untrusted. This approach faces two major challenges: (1) executing high-throughput, low-delay stream analytics in a single TEE, which is constrained by a low trusted computing base (TCB) and limited physical memory; (2) verifying execution of stream analytics as the execution involves untrusted software components on the edge. In ... Continue reading →

142 Views

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

by Jinse Kwon on 2019-09-19 16:06:25

Date : 2019. 09. 04 (Wed) 13:30 Locate : EB5. 533 Presenter : Jinse Kwon   Title : Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Author : Tal Ben-Nun, Torsten Hoefler (ETH Zurich, Zurich, Switzerland)   Abstract : Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system ... Continue reading →

171 Views

Software fault injection testing of the embedded software of a satellite launch vehicle

by Hyeoksoo Jang on 2019-09-19 15:41:10

Date: 2019. 08. 07 (Wed) 13:00 Locate: EB5. 533 Presenter: Hyeoksoo Jang Title: Software fault injection testing of the embedded software of a satellite launch vehicle Author: Anil Abraham Samuel, Jayalal N., Valsa B., Ignatious C.A., and John P. Zachariah Abstract: The software performing navigation, guidance, control, and mission-sequencing functionalities embedded in the flight computer system (FCS) of a satellite launch vehicle must be highly dependable. The presence of faults in the embedded flight software affects its dependability and may even jeopardize the entire mission, resulting in a huge loss to the space agency concerned. There are many techniques available to achieve high dependability and can be classified under fault avoidance, fault removal and fault tolerance. In the FCS of the Indian Space Research Organization’s (ISRO’s) satellite launch vehicles, all of the above means to achieve dependability are adopted. Fault avoidance and removal ... Continue reading →

368 Views

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

by Donghee Ha on 2019-07-22 16:09:47

Date: 2019. 07. 24 (Wed) 13:00 Locate: EB5. 533 Presenter: Donghee Ha Title: XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks Author: Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi Abstract: We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32x memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58x faster convolutional operations and 32x memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification ... Continue reading →

139 Views

Application Memory Isolation on Ultra-Low-Power MCUs

by Sihyeong Park on 2019-07-08 09:57:20

Date: 2019. 07. 10 (Wed) 13:00 Locate: EB5. 533 Presenter: Sihyeong Park Title: Application Memory Isolation on Ultra-Low-Power MCUs Author: Taylor Hardin, Dartmouth College; Ryan Scott, Clemson University; Patrick Proctor, Dartmouth College; Josiah Hester, Northwestern University; Jacob Sorber, Clemson University; David Kotz, Dartmouth College Abstract: The proliferation of applications that handle sensitive user data on wearable platforms generates a critical need for embedded systems that offer strong security without sacrificing flexibility and long battery life. To secure sensitive information, such as health data, ultra-low-power wearables must isolate applications from each other and protect the underlying system from errant or malicious application code. These platforms typically use microcontrollers that lack sophisticated Memory Management Units (MMU). Some include a Memory Protection Unit (MPU), but current MPUs are inadequate to the task, leading platform ... Continue reading →

168 Views

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

by Jinse Kwon on 2019-06-12 11:18:17

Date : 2019. 06. 26 (Wed) 13:00 Locate : EB5. 533 Presenter : Jinse Kwon   Title : PipeDream: Fast and Efficient Pipeline Parallel DNN Training Author : Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Ganger, Phil Gibbons (Microsoft Research, Carnegie Mellon University, Stanford University)   Abstract : PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large models and/or limited network bandwidth induce high communication-to-computation ratios. PipeDream reduces communication by up to 95% for large DNNs relative to data-parallel training, and allows perfect overlap of communication and computation. PipeDream keeps all available GPUs productive by systematically partitioning DNN ... Continue reading →

219 Views

How can do good research?

by Daeyoung Song on 2019-06-11 00:00:00

Date : 2019.06.13(Thur.) Locate : EB5. 607 Presenter : Hyungshin Kim Title : How Continue reading →

202 Views

[Xenomai] An Overview of the Real-Time Framework for Linux

by Hyeoksoo Jang on 2019-06-04 10:46:08

Date : 2019.6.5(Wed) Locate : EB5. 533 Presenter : Hyeoksoo Jang Title :  An Overview of the Real-Time Framework for Linux   Continue reading →

188 Views