Indexed on: 10 Jun '11Published on: 10 Jun '11Published in: Frontiers of Electrical and Electronic Engineering
The automatic recognition of the contents of a scene is an important issue in the computer vision field. Though considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most of the previous scene recognition models are based on the so-called “bag of visual words” method, which uses some clustering method to quantize the numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering center have great influence on the performance. Furthermore, the big size of the codebook has high computational cost and memory consumption. To overcome these drawbacks, we present an unsupervised natural scene recognition approach that is not based on the “bag of visual words” method. This approach works by creating multiple resolution images and partitioning them into sub-regions at different scales. The descriptors of all sub-regions in the same resolution image are directly concatenated for support vector machine (SVM) classifiers. To represent images more effectively, we present a new visual descriptor: weighted histograms of gradient orientation (WHGO). We evaluate our approach on three data sets: the 8 scene categories of Oliva et al., the 13 scene categories of Fei-Fei et al. and the 15 scene categories of Lazebnik et al. Experiments show that the WHGO descriptor outperforms the classical scale invariant feature transform (SIFT) descriptor in natural scene recognition, and our approach achieves good performances with respect to the state of the art methods.