Indexed on: 12 Feb '16Published on: 12 Feb '16Published in: Statistics - Machine Learning
The Lasso is one of the most popular methods in high dimensional statistical learning. Most existing theoretical results for the Lasso, however, require the samples to be iid. Recent work has provided guarantees for the Lasso assuming that the time series is generated by a sparse Vector Auto-Regressive (VAR) model with Gaussian innovations. Proofs of these results rely critically on the fact that the true data generating mechanism (DGM) is a finite-order Gaussian VAR. This assumption is quite brittle: linear transformations, including selecting a subset of variables, can lead to the violation of this assumption. In order to break free from such assumptions, we derive nonasymptotic inequalities for estimation error and prediction error of the Lasso estimate of the best linear predictor without assuming any special parametric form of the DGM. Instead, we rely only on (strict) stationarity and mixing conditions to establish consistency of the Lasso in the following two scenarios: (a) alpha-mixing Gaussian processes, and (b) beta-mixing sub-Gaussian random vectors. Our work provides an alternative proof of the consistency of the Lasso for sparse Gaussian VAR models. But the applicability of our results extends to non-Gaussian and non-linear times series models as the examples we provide demonstrate. In order to prove our results, we derive a novel Hanson-Wright type concentration inequality for beta-mixing sub-Gaussian random vectors that may be of independent interest.