PhD Candidate, University of Western Australia
Training a classifier to differentiate the target from the background for visual tracking
Visual tracking has attracted much attention of computer vision researchers for many years and visual tracking techniques have been used in many applications such as human behaviour analysis, medical imaging and surveillance video. However, the performance of a tracking algorithm may be degraded due to many challenging issues in the scenes, such as illumination change, deformation and background clutter. So far no algorithms can handle all these challenging issues. My research is to develop visual tracking algorithms to deal with two to three of these specific challenging issues.
Abstract: Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.
Pub.: 07 Jul '17, Pinned: 27 Jul '17
Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies-any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.
Pub.: 10 Sep '15, Pinned: 27 Jul '17
Abstract: Visual tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we propose to exploit the rich hierarchical features of deep convolutional neural networks to improve the accuracy and robustness of visual tracking. Deep neural networks trained on object recognition datasets consist of multiple convolutional layers. These layers encode target appearance with different levels of abstraction. For example, the outputs of the last convolutional layers encode the semantic information of targets and such representations are invariant to significant appearance variations. However, their spatial resolutions are too coarse to precisely localize the target. In contrast, features from earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchical features of convolutional layers as a nonlinear counterpart of an image pyramid representation and explicitly exploit these multiple levels of abstraction to represent target objects. Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance. We infer the maximum response of each layer to locate targets in a coarse-to-fine manner. To further handle the issues with scale estimation and target re-detection from tracking failures caused by heavy occlusion or moving out of the view, we conservatively learn another correlation filter that maintains a long-term memory of target appearance as a discriminative classifier. Extensive experimental results on large-scale benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art tracking methods.
Pub.: 12 Jul '17, Pinned: 27 Jul '17