PhD candidate, University of Waterloo
Our goal is to find the points in a database that are the closest to a given query point.
Many of the current large-scale datasets consist of high dimension vectors. For example, in multimedia applications, each image can be described by the features of various aspects of visual content like color, shape and objects. In addition to the huge volume of current datasets, the velocity of arrival (e.g. stream of video) and the large variety of heterogeneity 1(e.g. image, text, audio) of current datasets are unprecedented. In the context of retrieval algorithms for massive dataset, the most computationally demanding part often consists of searching for the most similar matches to the one or a set of query vectors. This task, referred to as the Nearest Neighbour (NN) matching, has a wide range of applications in many fields such as data mining, information retrieval, machine learning and computer vision. The simplest exact solution is to compare each query vector with all items in the dataset. However, this approach is not scalable for large-scale datasets. Applying the traditional exact search techniques on today’s large scale datasets can incur intolerable space and computational costs. Tackling the aforementioned problems have set the goals of our research. We aim at solving the approximate nearest neighbour problem with efficient time and space complexity such that even a single machine can perform large-scale nearest neighbours search. However, this comes at the cost of allowing the answers to be approximate rather than exact which is shown to be enough and useful in many applications.
Abstract: For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and novel architectures ushered in by the deep learning revolution. We hence propose herein a novel method to make deep convolutional neural networks produce supervised, compact, structured binary codes for visual search. Our method makes use of a novel block-softmax non-linearity and of batch-based entropy losses that together induce structure in the learned encodings. We show that our method outperforms state-of-the-art compact representations based on deep hashing or structured quantization in single and cross-domain category retrieval, instance retrieval and classification. We make our code and models publicly available online.
Pub.: 09 Aug '17, Pinned: 24 Aug '17
Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a widely used hashing method, which provides an unbiased estimate of pairwise angular similarity, yet may suffer from its large estimation variance. We propose in this work batch-orthogonal locality-sensitive hashing (BOLSH), as a significant improvement of SRP-LSH. Instead of independent random projections, BOLSH makes use of batch-orthogonalized random projections, i.e, we divide random projection vectors into several batches and orthogonalize the vectors in each batch respectively. These batch-orthogonalized random projections partition the data space into regular regions, and thus provide a more accurate estimator. We prove theoretically that BOLSH still provides an unbiased estimate of pairwise angular similarity, with a smaller variance for any angle in (0, π), compared with SRP-LSH. Furthermore, we give a lower bound on the reduction of variance. The extensive experiments on real data well validate that with the same length of binary code, BOLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, BOLSH shows the superiority in extensive approximate nearest neighbor (ANN) retrieval experiments.
Pub.: 10 Sep '15, Pinned: 24 Aug '17
Abstract: Many binary code embedding schemes have been actively studied recently, since they can provide efficient similarity search, and compact data representations suitable for handling large scale image databases. Existing binary code embedding techniques encode high-dimensional data by using hyperplane-based hashing functions. In this paper we propose a novel hypersphere-based hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplane-based hashing functions. We also propose a new binary code distance function, spherical Hamming distance, tailored for our hypersphere-based binary coding scheme, and design an efficient iterative optimization process to achieve both balanced partitioning for each hash function and independence between hashing functions. Furthermore, we generalize spherical hashing to support various similarity measures defined by kernel functions. Our extensive experiments show that our spherical hashing technique significantly outperforms state-of-the-art techniques based on hyperplanes across various benchmarks with sizes ranging from one to 75 million of GIST, BoW and VLAD descriptors. The performance gains are consistent and large, up to 100 percent improvements over the second best method among tested methods. These results confirm the unique merits of using hyperspheres to encode proximity regions in high-dimensional spaces. Finally, our method is intuitive and easy to implement.
Pub.: 07 Oct '15, Pinned: 24 Aug '17
Abstract: There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straight-forward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.
Pub.: 01 Jun '14, Pinned: 24 Aug '17