31,416 research outputs found
Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. METHODS: Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. RESULTS: The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. CONCLUSIONS: We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR
Disruption to control network function correlates with altered dynamic connectivity in the wider autism spectrum.
Autism is a common developmental condition with a wide, variable range of co-occurring neuropsychiatric symptoms. Contrasting with most extant studies, we explored whole-brain functional organization at multiple levels simultaneously in a large subject group reflecting autism's clinical diversity, and present the first network-based analysis of transient brain states, or dynamic connectivity, in autism. Disruption to inter-network and inter-system connectivity, rather than within individual networks, predominated. We identified coupling disruption in the anterior-posterior default mode axis, and among specific control networks specialized for task start cues and the maintenance of domain-independent task positive status, specifically between the right fronto-parietal and cingulo-opercular networks and default mode network subsystems. These appear to propagate downstream in autism, with significantly dampened subject oscillations between brain states, and dynamic connectivity configuration differences. Our account proposes specific motifs that may provide candidates for neuroimaging biomarkers within heterogeneous clinical populations in this diverse condition
Computational Models for Transplant Biomarker Discovery.
Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Good, great, or lucky? Screening for firms with sustained superior performance using heavy-tailed priors
This paper examines historical patterns of ROA (return on assets) for a
cohort of 53,038 publicly traded firms across 93 countries, measured over the
past 45 years. Our goal is to screen for firms whose ROA trajectories suggest
that they have systematically outperformed their peer groups over time. Such a
project faces at least three statistical difficulties: adjustment for relevant
covariates, massive multiplicity, and longitudinal dependence. We conclude
that, once these difficulties are taken into account, demonstrably superior
performance appears to be quite rare. We compare our findings with other recent
management studies on the same subject, and with the popular literature on
corporate success. Our methodological contribution is to propose a new class of
priors for use in large-scale simultaneous testing. These priors are based on
the hypergeometric inverted-beta family, and have two main attractive features:
heavy tails and computational tractability. The family is a four-parameter
generalization of the normal/inverted-beta prior, and is the natural conjugate
prior for shrinkage coefficients in a hierarchical normal model. Our results
emphasize the usefulness of these heavy-tailed priors in large multiple-testing
problems, as they have a mild rate of tail decay in the marginal likelihood
---a property long recognized to be important in testing.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS512 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …