4 research outputs found

    Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

    Get PDF
    Many problems in statistical pattern recognition and analysis require the classifcation and analysis of high dimensional data vectors. However, covariance estimation for high dimensional vectors is a classically difficult problem because the number of coefficients in the covariance grows as the dimension squared [1, 2, 3]. This problem, sometimes referred to as the curse of dimensionality [4], presents a classic dilemma in statistical pattern analysis and machine learning. In a typical application, one measures M versions of an N dimensional vector. If M \u3c N, then the sample covariance matrix will be singular with N - M eigenvalues equal to zero. Over the years, a variety of techniques have been proposed for computing a nonsingular estimate of the covariance. For example, regularized and shrinkage covariance estimators [5, 6, 7, 8, 9, 10] are examples of such techniques. In this paper, we propose a new approach to covariance estimation, which is based on constrained maximum likelihood (ML) estimation of the covariance. In particular, the covariance is constrained to have an eigen decomposition which can be represented as a sparse matrix transform (SMT) [11]. The SMT is formed by a product of pairwise coordinate rotations known as Givens rotations [12]. Using this framework, the covariance can be efficiently estimated using greedy minimization of the log likelihood function, and the number of Givens rotations can be efficiently computed using a cross-validation procedure. The estimator obtained using this method is always positive definite and well-conditioned even with limited sample size. In order to validate our model, we perform experiments using a standard set of hyperspectral data [13]. Our experiments show that SMT covariance estimation results in consistently better estimates of the covariance for a variety of different classes and sample sizes. Also, we show that the SMT method has a particular advantage over traditional methods when estimating small eigenvalues and their associated eigenvectors

    Mining of Ship Operation Data for Energy Conservation

    Get PDF

    On the Testing and Estimation of High-Dimensional Covariance Matrices

    Get PDF
    Many applications of modern science involve a large number of parameters. In many cases, the number of parameters, p, exceeds the number of observations, N. Classical multivariate statistics are based on the assumption that the number of parameters is fixed and the number of observations is large. Many of the classical techniques perform poorly, or are degenerate, in high-dimensional situations. In this work, we discuss and develop statistical methods for inference of data in which the number of parameters exceeds the number of observations. Specifically we look at the problems of hypothesis testing regarding and the estimation of the covariance matrix. A new test statistic is developed for testing the hypothesis that the covariance matrix is proportional to the identity. Simulations show this newly defined test is asymptotically comparable to those in the literature. Furthermore, it appears to perform better than those in the literature under certain alternative hypotheses. A new set of Stein-type shrinkage estimators are introduced for estimating the covariance matrix in large-dimensions. Simulations show that under the assumption of normality of the data, the new estimators are comparable to those in the literature. Simulations also indicate the new estimators perform better than those in the literature in cases of extreme high-dimensions. A data analysis of DNA microarray data also appears to confirm our results of improved performance in the case of extreme high-dimensionality
    corecore