Publications by Demjan Grubic

×

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

Systems Group Master's Thesis, no. 163; Department of Computer Science, April 2017
Supervised by: Prof. Ce Zhang
Training deep learning models has received tremendous research interest recently. In particular, there has been intensive research on reducing the communication cost of the training process when using multiple computational devices, through reducing the precision of the underlying data representation. Naturally, such methods induce system trade-offs—lowering communication precision could decrease communication overheads and improve scalability; but, on the other hand, it can also reduce the accuracy of training. In this paper, we study this trade-off space, and ask: Can lowprecision communication consistently improve the end-to-end performance of training modern neural networks, with no accuracy loss? From the performance point of view, the answer to this question may appear deceptively easy: compressing communication through low precision should help when the ratio between communication and computation is high. However, this answer is less straightforward when we try to generalize this principle across various neural network architectures (e.g., AlexNet vs. ResNet), number of GPUs (e.g., 2 GPUs vs. 8 GPUs), machine configurations (e.g., EC2 instances vs. NVIDIA DGX-1), programming models (e.g., MPI vs. NCCL), and even different GPU architectures (e.g., Kepler vs. Pascal). Currently, it is not clear how a realistic realization of all these factors maps to the speed up provided by low-precision communication. In this paper, we conduct an empirical study to answer this question and report the insights we obtain.
@mastersthesis{abc,
	abstract = {Training deep learning models has received tremendous research interest recently. In
particular, there has been intensive research on reducing the communication cost of
the training process when using multiple computational devices, through reducing
the precision of the underlying data representation. Naturally, such methods induce
system trade-offs{\textemdash}lowering communication precision could decrease communication
overheads and improve scalability; but, on the other hand, it can also reduce the
accuracy of training. In this paper, we study this trade-off space, and ask: Can lowprecision
communication consistently improve the end-to-end performance of training modern
neural networks, with no accuracy loss?
From the performance point of view, the answer to this question may appear deceptively
easy: compressing communication through low precision should help when the
ratio between communication and computation is high. However, this answer is less
straightforward when we try to generalize this principle across various neural network
architectures (e.g., AlexNet vs. ResNet), number of GPUs (e.g., 2 GPUs vs. 8 GPUs),
machine configurations (e.g., EC2 instances vs. NVIDIA DGX-1), programming models
(e.g., MPI vs. NCCL), and even different GPU architectures (e.g., Kepler vs. Pascal).
Currently, it is not clear how a realistic realization of all these factors maps to the speed
up provided by low-precision communication. In this paper, we conduct an empirical
study to answer this question and report the insights we obtain.},
	author = {Demjan Grubic},
	school = {163},
	title = {Communication-Scalable Machine Learning},
	year = {2017}
}