Indexed on: 06 Feb '20Published on: 06 Feb '20Published in: IEEE transactions on cybernetics
Discriminative correlation filters (DCF)-based trackers have been increasingly applied to visual tracking due to their high precision while running at high frame rates. However, most recent DCF-based methods solely concentrate on learning the correlation filter with spatial information and thus do not have sufficient descriptive power to discriminate the target from the background in the complex circumstances, such as full occlusion (OCC) and rapid target variation. In this article, we introduce a novel tracking framework that exploits the relationship between the target and its spatiotemporal context to improve tracking accuracy and robustness. Especially, we present our spatiotemporal context model in a hierarchical way, where each layer of the context pyramid is a spatial correlation filter learned from different temporal instances. For gaining an accurate spatiotemporal model, we propose an optimization fusion approach that can adaptively and efficiently learn the effect of each hierarchical layer and exploit these multiple temporal levels of correlation filters for visual tracking. Moreover, an adaptive model update strategy for correlation filters is introduced into the framework to dynamically select proper hierarchical layers, which boosts the temporal diversity of the target appearance, while radically reduces the number of model parameters and guarantees the real-time performance of the tracking method. The experimental results show that, with conventional handcrafted features, our tracker achieves the best success rates among available state-of-the-art trackers with handcrafted features, and provides state-of-the-art performance comparable to those of deep-learning-based trackers on OTB-2013, OTB-2015, VOT-2016, and UAV-20L benchmarks but runs significantly faster than deep trackers.