Indexed on: 06 Feb '16Published on: 06 Feb '16Published in: Mathematics - Statistics
Multivariate Gaussian is often used as a first approximation to the distribution of high-dimensional data. Determining the parameters of this distribution under various constraints is a widely studied problem in statistics, and is often considered as a prototype for testing new algorithms or theoretical frameworks. In this paper, we develop a nonasymptotic approach to the problem of estimating the parameters of a multivariate Gaussian distribution when data are corrupted by outliers. We propose an estimator---efficiently computable by solving a convex program---that robustly estimates the population mean and the population covariance matrix even when the sample contains a significant proportion of outliers. Our estimator of the corruption matrix is provably rate optimal simultaneously for the entry-wise $\ell_1$-norm, the Frobenius norm and the mixed $\ell_2/\ell_1$ norm. Furthermore, this optimality is achieved by a penalized square-root-of-least-squares method with a universal tuning parameter (calibrating the strength of the penalization). These results are partly extended to the case where $p$ is potentially larger than $n$, under the additional condition that the inverse covariance matrix is sparse.