PhD Proposal: Understanding and Automating Optimization Methods for Machine Learning

Talk
Zheng Xu
Time: 
11.29.2017 11:00 to 12:00
Location: 

AVW 4172

Data-driven machine learning methods have become a success in both industrial applications and academic research. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications. This thesis seeks to make machine learning methods widely accessible for users in the real world by 1) building fully automated optimization solvers that enable fast and easy training for non-expert users, and 2) accelerating inference in different computational environments. We study both alternating methods for classical convex problems, and stochastic methods for neural network models.First, we focus on alternating optimization methods, which minimize complex objective functions using simple minimization sub-problems. Some popular applications including sparse and low-rank models, regularized linear models, total variation image processing, distributed computing, and generative adversarial networks. Among the alternating optimization methods, the alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of problems that can be formulated as a two-term objective with a linear constraint.We proposed adaptive ADMM (AADMM), which is a fully automated solver that tunes the only free parameter in ADMM. We release the fast implementation for more than ten applications such as sparse linear regression, support vector machine classifier, semidefinite programming, and image denoising. By exploiting the proposed adaptive schema, we then automate variants of ADMM: relaxed ADMM, multi-block ADMM and consensus ADMM. We prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We are working on a comprehensive review of various automated ADMM methods for different applications. As future work, we propose to develop fully automated solver based on stochastic alternating methods.Second, we study stochastic gradient methods for neural networks, with the goal of understanding why they work, and how to further improve the speed and accuracy of training. Despite the complicated network architectures and the nonconvexity in the training objective, the relatively simple stochastic gradient descent (SGD) often performs well in practice. Using our experience of analyzing SGD for quantized neural networks, and our recent empirical studies of SGD, we seek to make powerful statements about the performance of SGD for general deep neural networks. Finally, we introduce method for accelerating inference tasks, with the goal of achieving real-time performance. Recent efforts to improve the performance of deep neural networks has resulted in a move towards wide and deep networks, which makes inference slow. We study acceleration methods belonging to two categories: knowledge distillation and network quantization. We proposed a GAN-based knowledge distillation method to train small neural networks with acceptable accuracy loss for acceleration, and empirically study the trade-off between acceleration and accuracy. We also study network quantization methods (that represent parameters with few bits) for compressing and accelerating neural networks. We have shown in previous work that training quantized neural networks is difficult. In our proposed work, we will develop novel algorithms to more easily train quantized neural networks, and achieve better acceleration by reducing the bit-depth of neural computations.

Examining Committee:

Chair: Dr. Tom Goldstein Dean's rep: Dr. David Jacobs Members: Dr. Furong Huang