304 research outputs found

    RLE Plots: Visualising Unwanted Variation in High Dimensional Data

    Get PDF
    Unwanted variation can be highly problematic and so its detection is often crucial. Relative log expression (RLE) plots are a powerful tool for visualising such variation in high dimensional data. We provide a detailed examination of these plots, with the aid of examples and simulation, explaining what they are and what they can reveal. RLE plots are particularly useful for assessing whether a procedure aimed at removing unwanted variation, i.e. a normalisation procedure, has been successful. These plots, while originally devised for gene expression data from microarrays, can also be used to reveal unwanted variation in many other kinds of high dimensional data, where such variation can be problematic.Comment: 9 pages, 3 figure

    A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

    Get PDF
    Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time. Availability: A bounded-memory implementation that can process any number of arrays is available in the open source R package aroma.affymetrix. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    EXPLORATION, NORMALIZATION, AND GENOTYPE CALLS OF HIGH DENSITY OLIGONUCLEOTIDE SNP ARRAY DATA

    Get PDF
    In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package

    Transcription factor binding site prediction with multivariate gene expression data

    Get PDF
    Multi-sample microarray experiments have become a standard experimental method for studying biological systems. A frequent goal in such studies is to unravel the regulatory relationships between genes. During the last few years, regression models have been proposed for the de novo discovery of cis-acting regulatory sequences using gene expression data. However, when applied to multi-sample experiments, existing regression based methods model each individual sample separately. To better capture the dynamic relationships in multi-sample microarray experiments, we propose a flexible method for the joint modeling of promoter sequence and multivariate expression data. In higher order eukaryotic genomes expression regulation usually involves combinatorial interaction between several transcription factors. Experiments have shown that spacing between transcription factor binding sites can significantly affect their strength in activating gene expression. We propose an adaptive model building procedure to capture such spacing dependent cis-acting regulatory modules. We apply our methods to the analysis of microarray time-course experiments in yeast and in Arabidopsis. These experiments exhibit very different dynamic temporal relationships. For both data sets, we have found all of the well-known cis-acting regulatory elements in the related context, as well as being able to predict novel elements.Comment: Published in at http://dx.doi.org/10.1214/10.1214/07-AOAS142 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    GENE SET ENRICHMENT ANALYSIS MADE SIMPLE

    Get PDF
    Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis (GSEA), is based on a statistical test known for its lack of sensitivity. In this paper we compare the performance of a simple alternative to GSEA.We find that this simple solution clearly outperforms GSEA.We demonstrate this with eight different microarray datasets

    Global analyses of mRNA translational control during early Drosophila embryogenesis

    Get PDF
    The polysomal profiles of over 15,000 transcripts during the first ten hours after egg laying have been determined

    Proximal genomic localization of STAT1 binding and regulated transcriptional activity

    Get PDF
    BACKGROUND: Signal transducer and activator of transcription (STAT) proteins are key regulators of gene expression in response to the interferon (IFN) family of anti-viral and anti-microbial cytokines. We have examined the genomic relationship between STAT1 binding and regulated transcription using multiple tiling microarray and chromatin immunoprecipitation microarray (ChIP-chip) experiments from public repositories. RESULTS: In response to IFN-γ, STAT1 bound proximally to regions of the genome that exhibit regulated transcriptional activity. This finding was consistent between different tiling microarray platforms, and between different measures of transcriptional activity, including differential binding of RNA polymerase II, and differential mRNA transcription. Re-analysis of tiling microarray data from a recent study of IFN-γ-induced STAT1 ChIP-chip and mRNA expression revealed that STAT1 binding is tightly associated with localized mRNA transcription in response to IFN-γ. Close relationships were also apparent between STAT1 binding, STAT2 binding, and mRNA transcription in response to IFN-α. Furthermore, we found that sites of STAT1 binding within the Encyclopedia of DNA Elements (ENCODE) region are precisely correlated with sites of either enhanced or diminished binding by the RNA polymerase II complex. CONCLUSION: Together, our results indicate that STAT1 binds proximally to regions of the genome that exhibit regulated transcriptional activity. This finding establishes a generalized basis for the positioning of STAT1 binding sites within the genome, and supports a role for STAT1 in the direct recruitment of the RNA polymerase II complex to the promoters of IFN-γ-responsive genes
    • …
    corecore