Mobile 150 Views

by Sujin Kim on 2020-09-10 17:59:44

Date: 2020. 09. 14 (Mon) 14:00-16:00

Locate: EB5. 533

Presenter: Sujin Kim

Title: Performance, Design, and Autotuning of Batched GEMM for GPUs

Author: Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra
Department of Electrical Engineering and Computer Science
University of Tennessee, Knoxville, USA
Oak Ridge National Laboratory, Oak Ridge, USA
University of Manchester, UK

Abstract: The general matrix-matrix multiplication (GEMM) is the most important numerical kernel in dense linear algebra. It is the key component for obtaining high performance in most LAPACK routines. As batched computations on relatively small problems continue to gain interest in many scienti c applications, there becomes a need to have a high performance GEMM kernel for a batch of small matrices. Such kernel should be well designed and tuned to handle small sizes, and to maintain high performance for realistic test cases found in the higher level LAPACK routines, and scienti c computing applications in general. This paper presents a high performance batched GEMM kernel on Graph-ics Processing Units (GPUs). We address batched problems with both
xed and variable sizes, and show that specialized GEMM designs and a comprehensive autotuning process are needed to handle problems of small sizes. For most performance test reported in this paper, the proposed kernels outperform state-of-the-art approaches using a K40c GPU.

Article source: //