Computationally efficient change point detection for high-dimensional regression

Research paper by Florencia Leonardi, Peter Bühlmann

Indexed on: 14 Jan '16Published on: 14 Jan '16Published in: Statistics - Methodology


Large-scale sequential data is often exposed to some degree of inhomogeneity in the form of sudden changes in the parameters of the data-generating process. We consider the problem of detecting such structural changes in a high-dimensional regression setting. We propose a joint estimator of the number and the locations of the change points and of the parameters in the corresponding segments. The estimator can be computed using dynamic programming or, as we emphasize here, it can be approximated using a binary search algorithm with $O(n \log(n) \mathrm{Lasso}(n))$ computational operations while still enjoying essentially the same theoretical properties; here $\mathrm{Lasso}(n)$ denotes the computational cost of computing the Lasso for sample size $n$. We establish oracle inequalities for the estimator as well as for its binary search approximation, covering also the case with a large (asymptotically growing) number of change points. We evaluate the performance of the proposed estimation algorithms on simulated data and apply the methodology to real data.