223 research outputs found
Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology
High-throughput biological assays such as microarrays let us ask very
detailed questions about how diseases operate, and promise to let us
personalize therapy. Data processing, however, is often not described well
enough to allow for exact reproduction of the results, leading to exercises in
"forensic bioinformatics" where aspects of raw data and reported results are
used to infer what methods must have been employed. Unfortunately, poor
documentation can shift from an inconvenience to an active danger when it
obscures not just methods but errors. In this report we examine several related
papers purporting to use microarray-based signatures of drug sensitivity
derived from cell lines to predict patient response. Patients in clinical
trials are currently being allocated to treatment arms on the basis of these
results. However, we show in five case studies that the results incorporate
several simple errors that may be putting patients at risk. One theme that
emerges is that the most common errors are simple (e.g., row or column
offsets); conversely, it is our experience that the most simple errors are
common. We then discuss steps we are taking to avoid such errors in our own
investigations.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS291 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient
High dimensional structured data enriched model describes groups of
observations by shared and per-group individual parameters, each with its own
structure such as sparsity or group sparsity. In this paper, we consider the
general form of data enrichment where data comes in a fixed but arbitrary
number of groups G. Any convex function, e.g., norms, can characterize the
structure of both shared and individual parameters. We propose an estimator for
high dimensional data enriched model and provide conditions under which it
consistently estimates both shared and individual parameters. We also delineate
sample complexity of the estimator and present high probability non-asymptotic
bound on estimation error of all parameters. Interestingly the sample
complexity of our estimator translates to conditions on both per-group sample
sizes and the total number of samples. We propose an iterative estimation
algorithm with linear convergence rate and supplement our theoretical analysis
with synthetic and real experimental results. Particularly, we show the
predictive power of data-enriched model along with its interpretable results in
anticancer drug sensitivity analysis
Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. METHODS: Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. RESULTS: The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. CONCLUSIONS: We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR
The arithmetic of zero cycles on surfaces with geometric genus and irregularity zero
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46238/1/208_2005_Article_BF01445218.pd
Motifs, L -functions, and the K -cohomology of rational surfaces over finite fields
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46231/1/208_2005_Article_BF01450740.pd
Obtaining reliable information from minute amounts of RNA using cDNA microarrays
BACKGROUND: High density cDNA microarray technology provides a powerful tool to survey the activity of thousands of genes in normal and diseased cells, which helps us both to understand the molecular basis of the disease and to identify potential targets for therapeutic intervention. The promise of this technology has been hampered by the large amount of biological material required for the experiments (more than 50 μg of total RNA per array). We have modified an amplification procedure that requires only 1 μg of total RNA. Analyses of the results showed that most genes that were detected as expressed or differentially expressed using the regular protocol were also detected using the amplification protocol. In addition, many genes that were undetected or weakly detected using the regular protocol were clearly detected using the amplification protocol. We have carried out a series of confirmation studies by northern blotting, western blotting, and immunohistochemistry assays. RESULTS: Our results showed that most of the new information revealed by the amplification protocol represents real gene activity in the cells. CONCLUSION: We have confirmed a powerful and consistent cDNA microarray procedure that can be used to study minute amounts of biological tissue
Polychrome: Creating and Assessing Qualitative Palettes with Many Colors
Although R includes numerous tools for creating color palettes to display continuous data, facilities for displaying categorical data primarily use the RColorBrewer package, which is, by default, limited to 12 colors. The colorspace package can produce more colors, but it is not immediately clear how to use it to produce colors that can be reliably distingushed in different kinds of plots. However, applications to genomics would be enhanced by the ability to display at least the 24 human chromosomes in distinct colors, as is common in technologies like spectral karyotyping. In this article, we describe the Polychrome package, which can be used to construct palettes with at least 24 colors that can be distinguished by most people with normal color vision. Polychrome includes a variety of visualization methods allowing users to evaluate the proposed palettes. In addition, we review the history of attempts to construct qualitative color palettes with many colors
- …