Regularized Covariance Matrix Estimation with High Dimensional Data for Supervised Anomaly Detection Problems

We address the problem of estimating highdimensional covariance matrices (CM) for the explicit purpose of supervised anomaly detection, in the case when the number n of data points is lower than their dimensionality p. This is increasingly common with the emergence of the Internet of Things that makes it possible to collect data from many sensors simultaneously, resulting in very high-dimensional data points. When we attempt to perform anomaly detection for such data by modeling the normal behavior of the system by means of a multivariate Gaussian distribution, and n < p, the sample CM is singular, and cannot be used directly without some form of regularization. In contrast to existing methods for CM regularization that aim to fit the training data accurately, we propose a regularization algorithm for CM estimation that directly aims to maximize the area under the resulting receiveroperator characteristic (AUROC) for the ultimate decision problem that needs to be solved: anomaly detection. Experiments on test problems demonstrate the ability of the proposed algorithm to find CM estimates significantly better at anomaly detection than existing estimation methods that are unaware of the decision task that the CMs they produce will be used in.