GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs

Wei Niu; Jiexiong Guan; Xipeng Shen; Yanzhi Wang; Gagan Agrawal; Bin Ren

doi:10.1109/MICRO56248.2022.00044

GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs

Wei Niu, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Gagan Agrawal, Bin Ren

Computer & Cyber Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.

Original language	English (US)
Title of host publication	Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
Publisher	IEEE Computer Society
Pages	512-529
Number of pages	18
ISBN (Electronic)	9781665462723
DOIs	https://doi.org/10.1109/MICRO56248.2022.00044
State	Published - 2022
Event	55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 - Chicago, United States Duration: Oct 1 2022 → Oct 5 2022

Publication series

Name	Proceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume	2022-October
ISSN (Print)	1072-4451

Conference

Conference	55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
Country/Territory	United States
City	Chicago
Period	10/1/22 → 10/5/22

Keywords

compiler optimization
deep neural network
mobile devices
VLIW instruction packing

ASJC Scopus subject areas

Hardware and Architecture

Access to Document

10.1109/MICRO56248.2022.00044

Cite this

Niu, W., Guan, J., Shen, X., Wang, Y., Agrawal, G., & Ren, B. (2022). GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs. In Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 (pp. 512-529). (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2022-October). IEEE Computer Society. https://doi.org/10.1109/MICRO56248.2022.00044

GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs. / Niu, Wei; Guan, Jiexiong; Shen, Xipeng et al.
Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022. IEEE Computer Society, 2022. p. 512-529 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2022-October).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Niu, W, Guan, J, Shen, X, Wang, Y, Agrawal, G & Ren, B 2022, GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs. in Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022. Proceedings of the Annual International Symposium on Microarchitecture, MICRO, vol. 2022-October, IEEE Computer Society, pp. 512-529, 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022, Chicago, United States, 10/1/22. https://doi.org/10.1109/MICRO56248.2022.00044

Niu W, Guan J, Shen X, Wang Y, Agrawal G, Ren B. GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs. In Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022. IEEE Computer Society. 2022. p. 512-529. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). doi: 10.1109/MICRO56248.2022.00044

@inproceedings{f42110e83ef04b0e885b8cefdb2d2b0d,

title = "GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs",

abstract = "More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.",

keywords = "compiler optimization, deep neural network, mobile devices, VLIW instruction packing",

author = "Wei Niu and Jiexiong Guan and Xipeng Shen and Yanzhi Wang and Gagan Agrawal and Bin Ren",

note = "Funding Information: The authors would like to thank the anonymous reviewers for their constructive comments and helpful suggestions. This work was supported in part by National Science Foundation (NSF) under the awards of CCF-2047516 (CAREER), CCF-2146873, CCF-2232813, CCF-2146852, CCF-2131509, CCF-2034850, and CCF-2007793, and Army Research Office/Army Research Laboratory via grant W911-NF-20-1-0167 to Northeastern University. Any errors and opinions are not those of the NSF, Army Research Office, or Department of Defense, and are attributable solely to the author(s). Publisher Copyright: {\textcopyright} 2022 IEEE.; 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 ; Conference date: 01-10-2022 Through 05-10-2022",

year = "2022",

doi = "10.1109/MICRO56248.2022.00044",

language = "English (US)",

series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",

publisher = "IEEE Computer Society",

pages = "512--529",

booktitle = "Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022",

}

TY - GEN

T1 - GCD2

T2 - 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022

AU - Niu, Wei

AU - Guan, Jiexiong

AU - Shen, Xipeng

AU - Wang, Yanzhi

AU - Agrawal, Gagan

AU - Ren, Bin

N1 - Funding Information: The authors would like to thank the anonymous reviewers for their constructive comments and helpful suggestions. This work was supported in part by National Science Foundation (NSF) under the awards of CCF-2047516 (CAREER), CCF-2146873, CCF-2232813, CCF-2146852, CCF-2131509, CCF-2034850, and CCF-2007793, and Army Research Office/Army Research Laboratory via grant W911-NF-20-1-0167 to Northeastern University. Any errors and opinions are not those of the NSF, Army Research Office, or Department of Defense, and are attributable solely to the author(s). Publisher Copyright: © 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.

AB - More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.

KW - compiler optimization

KW - deep neural network

KW - mobile devices

KW - VLIW instruction packing

UR - http://www.scopus.com/inward/record.url?scp=85141665744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85141665744&partnerID=8YFLogxK

U2 - 10.1109/MICRO56248.2022.00044

DO - 10.1109/MICRO56248.2022.00044

M3 - Conference contribution

AN - SCOPUS:85141665744

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 512

EP - 529

BT - Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022

PB - IEEE Computer Society

Y2 - 1 October 2022 through 5 October 2022

ER -