PhD Defense: Communication-efficient Hybrid Parallel Algorithms for Neural Network Training
IRB IRB-5237
Deep learning has made significant advancements across various fields, driven by increasingly larger neural networks and massive datasets. However, these improvements come at the cost of high computational demands, necessitating the use of thousands of GPUs operating in parallel for model training. At this scale, the overhead associated with inter-GPU communication becomes a major bottleneck, severely limiting efficient hardware resource utilization.
This thesis addresses the challenge of communication in large-scale parallel deep learning. First, it introduces a novel four-dimensional hybrid parallel algorithm designed to minimize communication overhead while maintaining ease of use for practitioners. Second, it presents a topology-aware communication model that identifies optimal configurations for this algorithm based on the hardware architecture, improving efficiency and scalability. Finally, the thesis develops highly scalable implementations of collective communication primitives commonly used in distributed deep learning, further enhancing performance.
By tackling these critical communication challenges, this work contributes to more efficient deep learning training at scale, enabling faster model convergence and better resource utilization across large GPU clusters.