222 research outputs found

    Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

    Full text link
    High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in "forensic bioinformatics" where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We then discuss steps we are taking to avoid such errors in our own investigations.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS291 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient

    Full text link
    High dimensional structured data enriched model describes groups of observations by shared and per-group individual parameters, each with its own structure such as sparsity or group sparsity. In this paper, we consider the general form of data enrichment where data comes in a fixed but arbitrary number of groups G. Any convex function, e.g., norms, can characterize the structure of both shared and individual parameters. We propose an estimator for high dimensional data enriched model and provide conditions under which it consistently estimates both shared and individual parameters. We also delineate sample complexity of the estimator and present high probability non-asymptotic bound on estimation error of all parameters. Interestingly the sample complexity of our estimator translates to conditions on both per-group sample sizes and the total number of samples. We propose an iterative estimation algorithm with linear convergence rate and supplement our theoretical analysis with synthetic and real experimental results. Particularly, we show the predictive power of data-enriched model along with its interpretable results in anticancer drug sensitivity analysis

    Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups

    Get PDF
    BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. METHODS: Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. RESULTS: The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. CONCLUSIONS: We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR

    Zero cycles on del Pezzo surfaces over local fields

    Get PDF

    The arithmetic of zero cycles on surfaces with geometric genus and irregularity zero

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46238/1/208_2005_Article_BF01445218.pd

    Motifs, L -functions, and the K -cohomology of rational surfaces over finite fields

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46231/1/208_2005_Article_BF01450740.pd

    Patients' Mental Well-being by PHA Questionnaires and Their Relation to Cancer Progression

    Get PDF

    Obtaining reliable information from minute amounts of RNA using cDNA microarrays

    Get PDF
    BACKGROUND: High density cDNA microarray technology provides a powerful tool to survey the activity of thousands of genes in normal and diseased cells, which helps us both to understand the molecular basis of the disease and to identify potential targets for therapeutic intervention. The promise of this technology has been hampered by the large amount of biological material required for the experiments (more than 50 μg of total RNA per array). We have modified an amplification procedure that requires only 1 μg of total RNA. Analyses of the results showed that most genes that were detected as expressed or differentially expressed using the regular protocol were also detected using the amplification protocol. In addition, many genes that were undetected or weakly detected using the regular protocol were clearly detected using the amplification protocol. We have carried out a series of confirmation studies by northern blotting, western blotting, and immunohistochemistry assays. RESULTS: Our results showed that most of the new information revealed by the amplification protocol represents real gene activity in the cells. CONCLUSION: We have confirmed a powerful and consistent cDNA microarray procedure that can be used to study minute amounts of biological tissue

    Polychrome: Creating and Assessing Qualitative Palettes with Many Colors

    Get PDF
    Although R includes numerous tools for creating color palettes to display continuous data, facilities for displaying categorical data primarily use the RColorBrewer package, which is, by default, limited to 12 colors. The colorspace package can produce more colors, but it is not immediately clear how to use it to produce colors that can be reliably distingushed in different kinds of plots. However, applications to genomics would be enhanced by the ability to display at least the 24 human chromosomes in distinct colors, as is common in technologies like spectral karyotyping. In this article, we describe the Polychrome package, which can be used to construct palettes with at least 24 colors that can be distinguished by most people with normal color vision. Polychrome includes a variety of visualization methods allowing users to evaluate the proposed palettes. In addition, we review the history of attempts to construct qualitative color palettes with many colors
    corecore