PhD Student, University of Maryland
Make both training and testing of machine learning models faster and more robust
The data-driven machine learning methods have achieved remarkable performance for many applications in computer vision, natural language processing, speech recognition and so on. There are usually two stages of machine learning methods, training a model from large amount of samples and apply the model for predicting new samples. The training is often slow due to the size of training samples, and it is difficult to tune the hyperparameters. The inference needs to be real-time for a lot of applications. My research focus on automating the training to make it fast and use-friendly, and accelerating the inference to make models widely applicable. In a nutshell, to make machine learning methods more accessible for non-experts to use.
Abstract: Non-differentiable and constrained optimization play a key role in machine learning, signal and image processing, communications, and beyond. For high-dimensional minimization problems involving large datasets or many unknowns, the forward-backward splitting method provides a simple, practical solver. Despite its apparently simplicity, the performance of the forward-backward splitting is highly sensitive to implementation details. This article is an introductory review of forward-backward splitting with a special emphasis on practical implementation concerns. Issues like stepsize selection, acceleration, stopping conditions, and initialization are considered. Numerical experiments are used to compare the effectiveness of different approaches. Many variations of forward-backward splitting are implemented in the solver FASTA (short for Fast Adaptive Shrinkage/Thresholding Algorithm). FASTA provides a simple interface for applying forward-backward splitting to a broad range of problems.
Pub.: 15 Feb '16, Pinned: 26 Oct '17
Abstract: Convolutional Neural Networks (CNNs) are extensively used in image and video recognition, natural language processing and other machine learning applications. The success of CNNs in these areas corresponds with a significant increase in the number of parameters and computation costs. Recent approaches towards reducing these overheads involve pruning and compressing the weights of various layers without hurting the overall CNN performance. However, using model compression to generate sparse CNNs mostly reduces parameters from the fully connected layers and may not significantly reduce the final computation costs. In this paper, we present a compression technique for CNNs, where we prune the filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole planes in the network, together with their connecting convolution kernels, the computational costs are reduced significantly. In contrast to other techniques proposed for pruning networks, this approach does not result in sparse connectivity patterns. Hence, our techniques do not need the support of sparse convolution libraries and can work with the most efficient BLAS operations for matrix multiplications. In our results, we show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% while regaining close to the original accuracy by retraining the networks.
Pub.: 30 Aug '16, Pinned: 26 Oct '17
Abstract: Many modern computer vision and machine learning applications rely on solving difficult optimization problems that involve non-differentiable objective functions and constraints. The alternating direction method of multipliers (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a generalization of ADMM that often achieves better performance, but its efficiency depends strongly on algorithm parameters that must be chosen by an expert user. We propose an adaptive method that automatically tunes the key algorithm parameters to achieve optimal performance without user oversight. Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A detailed convergence analysis of ARADMM is provided, and numerical results on several applications demonstrate fast practical convergence.
Pub.: 10 Apr '17, Pinned: 26 Oct '17