A linear speedup analysis of distributed deep learning with sparse and quantized communication

Peng Jiang, Gagan Agrawal

Research output: Contribution to journalConference articlepeer-review

28 Scopus citations

Abstract

The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that O(1/pMK) convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) that further reduces the communication cost while preserving the O(1/pMK) convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only 3% 5% communication data size.

Original languageEnglish (US)
Pages (from-to)2525-2536
Number of pages12
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Externally publishedYes
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'A linear speedup analysis of distributed deep learning with sparse and quantized communication'. Together they form a unique fingerprint.

Cite this