349 research outputs found

    The fused Kolmogorov filter: A nonparametric model-free screening method

    Full text link
    A new model-free screening method called the fused Kolmogorov filter is proposed for high-dimensional data analysis. This new method is fully nonparametric and can work with many types of covariates and response variables, including continuous, discrete and categorical variables. We apply the fused Kolmogorov filter to deal with variable screening problems emerging from a wide range of applications, such as multiclass classification, nonparametric regression and Poisson regression, among others. It is shown that the fused Kolmogorov filter enjoys the sure screening property under weak regularity conditions that are much milder than those required for many existing nonparametric screening methods. In particular, the fused Kolmogorov filter can still be powerful when covariates are strongly dependent on each other. We further demonstrate the superior performance of the fused Kolmogorov filter over existing screening methods by simulations and real data examples.Comment: Published at http://dx.doi.org/10.1214/14-AOS1303 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

    Full text link
    The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies

    Envelopes and principal component regression

    Full text link
    Envelope methods offer targeted dimension reduction for various models. The overarching goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with ultra high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a novel principal components regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified NIECE theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly

    Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction

    Full text link
    Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee & Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method

    Microenvironment Signals and Mechanisms in the Regulation of Osteosarcoma

    Get PDF
    Osteosarcoma (OS) is the most common malignant primary bone tumor in children and adolescents and features rapid development, strong metastatic ability, and poor prognosis. It has been well established that diverse genetic aberrations and metabolic alterations confer the tumorigenesis and development of OS. The intricate metabolism and vascularization that contributes to the nutrient and structural support for tumor progression should be thoroughly clarified to help us gain novel insights into OS and its clinical diagnoses and treatments. With regard to the complex bone extracellular matrix (ECM) and local cell populations, we intend to illustrate the interrelationship between various microenvironmental signals and the different stages of OS evolution. Solid evidence has noted two crucial factors of the OS microenvironment in the acquisition of stem cell phenotypes - transforming growth factor-β1 (TGF-β1) signaling and hypoxia. Different cell subtypes in the local environment might also serve as unique contributors that interact with each other and communicate with distant cells, thus participating in local invasion and metastasis. Proper models have been established and improved to reveal the evolutionary footsteps of how normal cells transform into a neoplastic state and progress toward malignancy
    corecore