6 research outputs found

    Asymptotic expansion of the minimum covariance determinant estimators

    Get PDF
    In arXiv:0907.0079 by Cator and Lopuhaa, an asymptotic expansion for the MCD estimators is established in a very general framework. This expansion requires the existence and non-singularity of the derivative in a first-order Taylor expansion. In this paper, we prove the existence of this derivative for multivariate distributions that have a density and provide an explicit expression. Moreover, under suitable symmetry conditions on the density, we show that this derivative is non-singular. These symmetry conditions include the elliptically contoured multivariate location-scatter model, in which case we show that the minimum covariance determinant (MCD) estimators of multivariate location and covariance are asymptotically equivalent to a sum of independent identically distributed vector and matrix valued random elements, respectively. This provides a proof of asymptotic normality and a precise description of the limiting covariance structure for the MCD estimators.Comment: 21 page

    Student Sliced Inverse Regression

    Get PDF
    International audienceSliced Inverse Regression (SIR) has been extensively used to reduce the dimension of the predictor space before performing regression. SIR is originally a model free method but it has been shown to actually correspond to the maximum likelihood of an inverse regression model with Gaussian errors. This intrinsic Gaussianity of standard SIR may explain its high sensitivity to outliers as observed in a number of studies. To improve robustness, the inverse regression formulation of SIR is therefore extended to non-Gaussian errors with heavy-tailed distributions. Considering Student distributed errors it is shown that the inverse regression remains tractable via an Expectation- Maximization (EM) algorithm. The algorithm is outlined and tested in the presence of outliers, both in simulated and real data, showing improved results in comparison to a number of other existing approaches

    Some statistical methods for dimension reduction

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe aim of the work in this thesis is to carry out dimension reduction (DR) for high dimensional (HD) data by using statistical methods for variable selection, feature extraction and a combination of the two. In Chapter 2, the DR is carried out through robust feature extraction. Robust canonical correlation (RCCA) methods have been proposed. In the correlation matrix of canonical correlation analysis (CCA), we suggest that the Pearson correlation should be substituted by robust correlation measures in order to obtain robust correlation matrices. These matrices have been employed for producing RCCA. Moreover, the classical covariance matrix has been substituted by robust estimators for multivariate location and dispersion in order to get RCCA. In Chapter 3 and 4, the DR is carried out by combining the ideas of variable selection using regularisation methods with feature extraction, through the minimum average variance estimator (MAVE) and single index quantile regression (SIQ) methods, respectively. In particular, we extend the sparse MAVE (SMAVE) reported in (Wang and Yin, 2008) by combining the MAVE loss function with different regularisation penalties in Chapter 3. An extension of the SIQ of Wu et al. (2010) by considering different regularisation penalties is proposed in Chapter 4. In Chapter 5, the DR is done through variable selection under Bayesian framework. A flexible Bayesian framework for regularisation in quantile regression (QR) model has been proposed. This work is different from Bayesian Lasso quantile regression (BLQR), employing the asymmetric Laplace error distribution (ALD). The error distribution is assumed to be an infinite mixture of Gaussian (IMG) densities

    Dimension reduction and efficient recommender system for large-scale complex data

    Get PDF
    Large-scale complex data have drawn great attention in recent years, which play an important role in information technology and biomedical research. In this thesis, we address three challenging issues: sufficient dimension reduction for longitudinal data, nonignorable missing data with refreshment samples, and large-scale recommender systems. In the first part of this thesis, we incorporate correlation structure in sufficient dimension reduction for longitudinal data. Existing sufficient dimension reduction approaches assuming independence may lead to substantial loss of efficiency. We apply the quadratic inference function to incorporate the correlation information and apply the transformation method to recover the central subspace. The proposed estimators are shown to be consistent and more efficient than the ones assuming independence. In addition, the estimated central subspace is also efficient when the correlation information is taken into account. We compare the proposed method with other dimension reduction approaches through simulation studies, and apply this new approach to an environmental health study. In the second part of this thesis, we address nonignorable missing data which occur frequently in longitudinal studies and can cause biased estimations. Refreshment samples which recruit new subjects in subsequent waves from the original population could mitigate the bias. In this thesis, we introduce a mixed-effects estimating equation approach which enables one to incorporate refreshment samples and recover missing information. We show that the proposed method achieves consistency and asymptotic normality for fixed-effect estimation under shared-parameter models, and we extend it to a more general nonignorable-missing framework. Our finite sample simulation studies show the effectiveness and robustness of the proposed method under different missing mechanisms. In addition, we apply our method to election poll longitudinal survey data with refreshment samples from the 2007-2008 Associated Press–Yahoo! News. In the third part of this thesis, we develop a novel recommender system which track users' preferences and recommend items of interest effectively. In this thesis, we propose a group-specific method to utilize dependency information from users and items which share similar characteristics under the singular value decomposition framework. The new approach is effective for the "cold-start" problem, where new users and new items' information is not available from the existing data collection. One advantage of the proposed model is that we are able to incorporate information from the missing mechanism and group-specific features through clustering based on variables associated with missing patterns. In addition, we propose a new algorithm that embeds a back-fitting algorithm into alternating least squares, which avoids large matrices operation and big memory storage, and therefore makes it feasible to achieve scalable computing. Our simulation studies and MovieLens data analysis both indicate that the proposed group-specific method improves prediction accuracy significantly compared to existing competitive recommender system approaches

    Robust dimension reduction based on canonical correlation

    No full text
    The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method.62H12 Canonical correlation Dimension reduction MCD estimator Permutation test Robustness
    corecore