Neural network training relies on our ability to find “good” minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood.

Stabilizing GANs with Prediction

in optimization, machine learning

2017-12-11

Why are GANs hard to train?

Adversarial neural networks solve many important problems in data science, but are notoriously difficult to train. These difficulties come from the fact that optimal weights for adversarial nets correspond to saddle points, and not minimizers, of the loss function. The alternating stochastic gradient methods typically used for such problems do not reliably converge to saddle points, and when convergence does happen it is often highly sensitive to learning rates.

PhasePack

in optimization

2017-11-20

PhasePack

A phase retrieval library

Phase retrieval is the recovery of a signal from only the magnitudes, and not the phases, of complex-valued linear measurements. Phase retrieval problems arise in many different applications, particularly in crystallography and microscopy. Mathematically, phase retrieval recovers a complex valued signal \(x\in \mathbb{C}^n\) from \(m\) measurements of the form

Training Quantized Nets: A Deeper Understanding

in machine learning, optimization

2017-11-10

Why quantized nets?

Deep neural networks are an integral part of state-of-the-art computer vision and natural language processing systems. Because of their high memory requirements and computational complexity, networks are usually trained using powerful hardware. There is an increasing interest in training and deploying neural networks directly on battery-powered devices, such as cell phones or other platforms. Such low-power embedded systems are memory and power limited, and in some cases lack basic support for floating-point arithmetic.

PhaseMax

in optimization

2016-12-27

PhaseMax

convex relaxation without lifting

-

Phase retrieval deals with the recovery of an \(n\)-dimensional signal \(x^0\in\mathbb{C}^n\), from \(m\geq n\) magnitude measurements of the form \begin{align} \label{eq:original} b_i = |\langle a_i, x^0\rangle|, \quad i = 1,2,\ldots,m, \end{align} where \(a_i\in\mathbb{C}^n\), and \(i=1,2,\ldots,m\) are measurement vectors. While the recovery of \(x^0\) from these non-linear measurements is non-convex, it can be convexified via “lifting” methods that convert the phase retrieval problem to a semidefinite program (SDP). But convexity comes at a high cost: lifting methods square the dimensionality of the problem.

FASTA

in optimization

2016-10-23

fast adaptive
shrinkage/thresholding
algorithm

FASTA (Fast Adaptive Shrinkage/Thresholding Algorithm) is an efficient, easy-to-use implementation of the Forward-Backward Splitting (FBS) method (also known as the proximal gradient method) for regularized optimization problems. Many variations on FBS are available in FASTA, including the popular accelerated variant FISTA (Beck and Teboulle ’09), the adaptive stepsize rule SpaRSA (Wright, Nowak, Figueiredo ’09), and other variants described in the review A Field Guide to Forward-Backward Splitting with a FASTA Implementation. Whether the problem you are solving is simple or complex, FASTA makes things easy by handling issues like stepsize selection, acceleration, and stopping conditions for you.

Primal-dual hybrid gradient method

in optimization

2016-04-25

The Primal-Dual Hybrid Gradient (PDHG) method, also known as the Chambolle-Pock method, is a powerful splitting method that can solve a wide range of constrained and non-differentiable optimization problems. Unlike the popular ADMM method, the PDHG approach usually does not require expensive minimization sub-steps. We provide adaptive stepsize selection rules that automate the solver, and increase its speed and robustness. The test problems and adaptive stepsize strategies presented here were proposed in our papers Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems and Adaptive Primal-Dual Splitting Methods for Statistical Learning and Image Processing.

Split Bregman

in optimization

2014-04-23

The Split Bregman method is a technique for solving a variety of L1-regularized optimization problems, and is particularly effective for problems involving total-variation regularization. Split Bregman is one of the fastest solvers for Total-Variation denoising, image reconstruction from Fourier coefficients, convex image segmentation, and many other problems.

Optimization

Why are GANs hard to train?

PhasePack

A phase retrieval library

Why quantized nets?

PhaseMax

convex relaxation without lifting

fast adaptive shrinkage/thresholding algorithm

fast adaptive
shrinkage/thresholding
algorithm