13 research outputs found

    Biological assessment of robust noise models in microarray data analysis

    Get PDF
    Motivation: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    Regularized estimation of linear functionals of precision matrices for high-dimensional time series

    Full text link
    This paper studies a Dantzig-selector type regularized estimator for linear functionals of high-dimensional linear processes. Explicit rates of convergence of the proposed estimator are obtained and they cover the broad regime from i.i.d. samples to long-range dependent time series and from sub-Gaussian innovations to those with mild polynomial moments. It is shown that the convergence rates depend on the degree of temporal dependence and the moment conditions of the underlying linear processes. The Dantzig-selector estimator is applied to the sparse Markowitz portfolio allocation and the optimal linear prediction for time series, in which the ratio consistency when compared with an oracle estimator is established. The effect of dependence and innovation moment conditions is further illustrated in the simulation study. Finally, the regularized estimator is applied to classify the cognitive states on a real fMRI dataset and to portfolio optimization on a financial dataset.Comment: 44 pages, 4 figure

    PERT: A Method for Expression Deconvolution of Human Blood Samples from Varied Microenvironmental and Developmental Conditions

    Get PDF
    The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT's predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity

    ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions

    Full text link
    We present a robust alternative to principal component analysis (PCA) --- called elliptical component analysis (ECA) --- for analyzing high dimensional, elliptically distributed data. ECA estimates the eigenspace of the covariance matrix of the elliptical data. To cope with heavy-tailed elliptical distributions, a multivariate rank statistic is exploited. At the model-level, we consider two settings: either that the leading eigenvectors of the covariance matrix are non-sparse or that they are sparse. Methodologically, we propose ECA procedures for both non-sparse and sparse settings. Theoretically, we provide both non-asymptotic and asymptotic analyses quantifying the theoretical performances of ECA. In the non-sparse setting, we show that ECA's performance is highly related to the effective rank of the covariance matrix. In the sparse setting, the results are twofold: (i) We show that the sparse ECA estimator based on a combinatoric program attains the optimal rate of convergence; (ii) Based on some recent developments in estimating sparse leading eigenvectors, we show that a computationally efficient sparse ECA estimator attains the optimal rate of convergence under a suboptimal scaling.Comment: to appear in JASA (T&M
    corecore