TY - GEN
T1 - DNNFusion
T2 - 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021
AU - Niu, Wei
AU - Guan, Jiexiong
AU - Wang, Yanzhi
AU - Agrawal, Gagan
AU - Ren, Bin
N1 - Funding Information:
The authors would like to thank the anonymous reviewers for their valuable and thorough comments. The authors are especially grateful to the shepherd Tatiana Shpeisman for her innumerable helpful suggestions and comments that help improve this paper substantially. This work is supported in part by Jeffress Trust Awards in Interdisciplinary Research, and National Science Foundation (NSF) under CCF-1629392, CCF-2007793, CCF-1919117, and OAC-2034850. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Thomas F. and Kate Miller Jeffress Memorial Trust, or NSF.
Publisher Copyright:
© 2021 ACM.
PY - 2021/6/18
Y1 - 2021/6/18
N2 - Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
AB - Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
KW - Compiler Optimization
KW - Deep Neural Network
KW - Mobile Devices
KW - Operator Fusion
UR - http://www.scopus.com/inward/record.url?scp=85108902126&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108902126&partnerID=8YFLogxK
U2 - 10.1145/3453483.3454083
DO - 10.1145/3453483.3454083
M3 - Conference contribution
AN - SCOPUS:85108902126
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 883
EP - 898
BT - PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
A2 - Freund, Stephen N.
A2 - Yahav, Eran
PB - Association for Computing Machinery
Y2 - 20 June 2021 through 25 June 2021
ER -