13 research outputs found
Biological assessment of robust noise models in microarray data analysis
Motivation: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Regularized estimation of linear functionals of precision matrices for high-dimensional time series
This paper studies a Dantzig-selector type regularized estimator for linear
functionals of high-dimensional linear processes. Explicit rates of convergence
of the proposed estimator are obtained and they cover the broad regime from
i.i.d. samples to long-range dependent time series and from sub-Gaussian
innovations to those with mild polynomial moments. It is shown that the
convergence rates depend on the degree of temporal dependence and the moment
conditions of the underlying linear processes. The Dantzig-selector estimator
is applied to the sparse Markowitz portfolio allocation and the optimal linear
prediction for time series, in which the ratio consistency when compared with
an oracle estimator is established. The effect of dependence and innovation
moment conditions is further illustrated in the simulation study. Finally, the
regularized estimator is applied to classify the cognitive states on a real
fMRI dataset and to portfolio optimization on a financial dataset.Comment: 44 pages, 4 figure
PERT: A Method for Expression Deconvolution of Human Blood Samples from Varied Microenvironmental and Developmental Conditions
The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT's predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity
ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions
We present a robust alternative to principal component analysis (PCA) ---
called elliptical component analysis (ECA) --- for analyzing high dimensional,
elliptically distributed data. ECA estimates the eigenspace of the covariance
matrix of the elliptical data. To cope with heavy-tailed elliptical
distributions, a multivariate rank statistic is exploited. At the model-level,
we consider two settings: either that the leading eigenvectors of the
covariance matrix are non-sparse or that they are sparse. Methodologically, we
propose ECA procedures for both non-sparse and sparse settings. Theoretically,
we provide both non-asymptotic and asymptotic analyses quantifying the
theoretical performances of ECA. In the non-sparse setting, we show that ECA's
performance is highly related to the effective rank of the covariance matrix.
In the sparse setting, the results are twofold: (i) We show that the sparse ECA
estimator based on a combinatoric program attains the optimal rate of
convergence; (ii) Based on some recent developments in estimating sparse
leading eigenvectors, we show that a computationally efficient sparse ECA
estimator attains the optimal rate of convergence under a suboptimal scaling.Comment: to appear in JASA (T&M