A pinboard by
Zheng Xu

PhD Student, University of Maryland


Make both training and testing of machine learning models faster and more robust

The data-driven machine learning methods have achieved remarkable performance for many applications in computer vision, natural language processing, speech recognition and so on. There are usually two stages of machine learning methods, training a model from large amount of samples and apply the model for predicting new samples. The training is often slow due to the size of training samples, and it is difficult to tune the hyperparameters. The inference needs to be real-time for a lot of applications. My research focus on automating the training to make it fast and use-friendly, and accelerating the inference to make models widely applicable. In a nutshell, to make machine learning methods more accessible for non-experts to use.


Pruning Filters for Efficient ConvNets

Abstract: Convolutional Neural Networks (CNNs) are extensively used in image and video recognition, natural language processing and other machine learning applications. The success of CNNs in these areas corresponds with a significant increase in the number of parameters and computation costs. Recent approaches towards reducing these overheads involve pruning and compressing the weights of various layers without hurting the overall CNN performance. However, using model compression to generate sparse CNNs mostly reduces parameters from the fully connected layers and may not significantly reduce the final computation costs. In this paper, we present a compression technique for CNNs, where we prune the filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole planes in the network, together with their connecting convolution kernels, the computational costs are reduced significantly. In contrast to other techniques proposed for pruning networks, this approach does not result in sparse connectivity patterns. Hence, our techniques do not need the support of sparse convolution libraries and can work with the most efficient BLAS operations for matrix multiplications. In our results, we show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% while regaining close to the original accuracy by retraining the networks.

Pub.: 30 Aug '16, Pinned: 26 Oct '17