2,647 research outputs found

    Comparison of sequence-dependent tiling array normalization approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The detection of enriched DNA or RNA fragments by tiling microarrays has become more and more popular. These microarrays contain a high number of small probes covering genomic loci. However, to achieve high coverage the probe sequences cannot be selected for their hybridization properties. The affinity of the probes towards their targets varies in a sequence-dependent manner. In order to remove this bias a number of approaches have been developed and shown to increase the detection of enriched DNA or RNA fragments. However, these approaches also employ a peak detection algorithm that is different from the one used previously. Thus, it seems possible that the enhancement of detection is due to the peak detection algorithm rather than the sequence-dependent normalization.</p> <p>Results</p> <p>We compared three different sequence-dependent probe level normalization procedures to a naïve sequence-independent normalization technique. In order to achieve maximal comparability, we used the normalized intensity values as input to a single peak detection algorithm. A so-called "spike-in" data set served as benchmark for the performance. We will show that the sequence-dependent normalization procedures do not perform better than the naïve approach, suggesting that the benefit of using these normalization approaches is limited. Furthermore, we will show that the naïve approach does well, because it effectively removes the sequence-dependent component of the measured intensities with the help of the control hybridization experiment.</p> <p>Conclusion</p> <p>Sequence-dependent normalization of microarray data hardly improves the detection of enriched DNA or RNA fragments. The "success" of the sequence-independent naïve approach is only possible due to the control experiment and requires proper scaling of the measured intensities.</p

    Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

    Full text link
    Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased mapping of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This paper presents a doubly stochastic latent variable analysis method for transcript discovery and protein binding region localization using tiling array data. This model is unique in that it considers actual genomic distance between probes. Additionally, the model is designed to be robust to cross-hybridized and nonresponsive probes, which can often lead to false-positive results in microarray experiments. We apply our model to a transcript finding data set to illustrate the consistency of our method. Additionally, we apply our method to a spike-in experiment that can be used as a benchmark data set for researchers interested in developing and comparing future tiling array methods. The results indicate that our method is very powerful, accurate and can be used on a single sample and without control experiments, thus defraying some of the overhead cost of conducting experiments on tiling arrays.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS248 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Model-based analysis of two-color arrays (MA2C)

    Get PDF
    A normalization method based on probe GC content for two-color tiling arrays and an algorithm for detecting peak regions are presented. They are available in a stand-alone Java program

    Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays

    Get PDF
    Not until recently have custom made high-density oligonucleotide microarrays been available at an affordable price. The aim of this thesis was to design microarrays and analysis algorithms for DNA repair and DNA damage detection, and to apply the methods in real experiments. Thomassen et al. have used their custom designed whole genome-tiling microarrays for detection of transcriptional changes in Escherichia coli after exposure to DNA damageing reagents. The transcriptional changes in E. coli treated with UV light or the methylating reagent MNNG were shown to be larger and to include far more genes than previously reported. To optimize the data analysis for the custom made arrays, Thomassen and coworkers designed their own normalization and analysis algorithms, and showed these more suitable than established methods that are currently applied on custom tiling arrays. Among other findings several novel stress-induced transcripts were detected, of which one is predicted to be a UV-induced short transmembrane protein. Additionally, no upregulation of the previously described UV-inducible aidB is shown. In the MNNG study several genes are shown as downregulated in response to DNA damage although having upstream regulatory sequences similar to the established LexA box A and B. This indicates that the LexA regulon also might control gene repression and that the box A and B sequence can not alone answer for the LexA controlled gene regulation. Thomassen et al. have also custom designed a microarray for oncogenic fusion gene detection. Cancer specific fusion genes are often used to subgroup cancers and to define the optimal treatment, but currently the laboratory detection procedure is both laborious and tedious. In a blinded study on six cancer cell lines proof of principle was shown by detection of six out of six positive controls. The design and analysis methods for this microarray are now being refined to make a diagnostic fusion gene detection tool

    Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay used for investigating DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is to reliably identify and localize genomic regions that bind a specific protein. Further investigation compares binding profiles of functionally related proteins, or binding profiles of the same proteins in different genetic backgrounds or experimental conditions. Ultimately, the goal is to gain a mechanistic understanding of the effects of DNA binding events on gene expression.</p> <p>Results</p> <p>We present a free, open-source <b>R</b>/Bioconductor package <it>Starr </it>that facilitates comparative analysis of ChIP-chip data across experiments and across different microarray platforms. The package provides functions for data import, quality assessment, data visualization and exploration. <it>Starr </it>includes high-level analysis tools such as the alignment of ChIP signals along annotated features, correlation analysis of ChIP signals with complementary genomic data, peak-finding and comparative display of multiple clusters of binding profiles. It uses standard Bioconductor classes for maximum compatibility with other software. Moreover, <it>Starr </it>automatically updates microarray probe annotation files by a highly efficient remapping of microarray probe sequences to an arbitrary genome.</p> <p>Conclusion</p> <p><it>Starr </it>is an <b>R </b>package that covers the complete ChIP-chip workflow from data processing to binding pattern detection. It focuses on the high-level data analysis, e.g., it provides methods for the integration and combined statistical analysis of binding profiles and complementary functional genomics data. <it>Starr </it>enables systematic assessment of binding behaviour for groups of genes that are alingned along arbitrary genomic features.</p

    Improved ChIP-chip analysis by a mixture model approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray analysis of immunoprecipitated chromatin (ChIP-chip) has evolved from a novel technique to a standard approach for the systematic study of protein-DNA interactions. In ChIP-chip, sites of protein-DNA interactions are identified by signals from the hybridization of selected DNA to tiled oligomers and are graphically represented as peaks. Most existing methods were designed for the identification of relatively sparse peaks, in the presence of replicates.</p> <p>Results</p> <p>We propose a data normalization method and a statistical method for peak identification from ChIP-chip data based on a mixture model approach. In contrast to many existing methods, including methods that also employ mixture model approaches, our method is more flexible by imposing less restrictive assumptions and allowing a relatively large proportion of peak regions. In addition, our method does not require experimental replicates and is computationally efficient. We compared the performance of our method with several representative existing methods on three datasets, including a spike-in dataset. These comparisons demonstrate that our approach is more robust and has comparable or higher power than the other methods, especially in the context of abundant peak regions.</p> <p>Conclusion</p> <p>Our data normalization and peak detection methods have improved performance to detect peak regions in ChIP-chip data.</p

    Definition of the σW regulon of Bacillus subtilis in the absence of stress

    Get PDF
    Bacteria employ extracytoplasmic function (ECF) sigma factors for their responses to environmental stresses. Despite intensive research, the molecular dissection of ECF sigma factor regulons has remained a major challenge due to overlaps in the ECF sigma factor-regulated genes and the stimuli that activate the different ECF sigma factors. Here we have employed tiling arrays to single out the ECF σW regulon of the Gram-positive bacterium Bacillus subtilis from the overlapping ECF σX, σY, and σM regulons. For this purpose, we profiled the transcriptome of a B. subtilis sigW mutant under non-stress conditions to select candidate genes that are strictly σW-regulated. Under these conditions, σW exhibits a basal level of activity. Subsequently, we verified the σW-dependency of candidate genes by comparing their transcript profiles to transcriptome data obtained with the parental B. subtilis strain 168 grown under 104 different conditions, including relevant stress conditions, such as salt shock. In addition, we investigated the transcriptomes of rasP or prsW mutant strains that lack the proteases involved in the degradation of the σW anti-sigma factor RsiW and subsequent activation of the σW-regulon. Taken together, our studies identify 89 genes as being strictly σW-regulated, including several genes for non-coding RNAs. The effects of rasP or prsW mutations on the expression of σW-dependent genes were relatively mild, which implies that σW-dependent transcription under non-stress conditions is not strictly related to RasP and PrsW. Lastly, we show that the pleiotropic phenotype of rasP mutant cells, which have defects in competence development, protein secretion and membrane protein production, is not mirrored in the transcript profile of these cells. This implies that RasP is not only important for transcriptional regulation via σW, but that this membrane protease also exerts other important post-transcriptional regulatory functions

    The Dawning Era of Comprehensive Transcriptome Analysis in Cellular Microbiology

    Get PDF
    Bacteria rapidly change their transcriptional patterns during infection in order to adapt to the host environment. To investigate host–bacteria interactions, various strategies including the use of animal infection models, in vitro assay systems and microscopic observations have been used. However, these studies primarily focused on a few specific genes and molecules in bacteria. High-density tiling arrays and massively parallel sequencing analyses are rapidly improving our understanding of the complex host–bacterial interactions through identification and characterization of bacterial transcriptomes. Information resulting from these high-throughput techniques will continue to provide novel information on the complexity, plasticity, and regulation of bacterial transcriptomes as well as their adaptive responses relative to pathogenecity. Here we summarize recent studies using these new technologies and discuss the utility of transcriptome analysis

    Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays

    Get PDF
    Statistical analysis on tiling array data is extremely challenging due to the astronomically large number of sequence probes, high noise levels of individual probes and limited number of replicates in these data. To overcome these difficulties, we first developed statistical error estimation and weighted ANOVA modeling approaches to high-density tiling array data, especially the former based on an advanced error-pooling method to accurately obtain heterogeneous technical error of small-sample tiling array data. Based on these approaches, we analyzed the high-density tiling array data of the temporal replication patterns during cell-cycle S phase of synchronized HeLa cells on human chromosomes 21 and 22. We found many novel temporal replication patterns, identifying about 26% of over 1 million tiling array sequence probes with significant differential replication during the four 2-h time periods of S phase. Among these differentially replicated probes, 126 941 sequence probes were matched to 417 known genes. The majority of these genes were found to be replicated within one or two consecutive time periods, while the others were replicated at two non-consecutive time periods. Also, coding regions found to be more differentially replicated in particular time periods than noncoding regions in the gene-poor chromosome 21 (25% differentially replicated among genic probes versus 18.6% among intergenic probes), while such a phenomenon was less prominent in gene-rich chromosome 22. A rigorous statistical testing for local proximity of differentially replicated genic and intergenic probes was performed to identify significant stretches of differentially replicated sequence regions. From this analysis, we found that adjacent genes were frequently replicated at different time periods, potentially implying the existence of quite dense replication origins. Evaluating the conditional probability significance of identified gene ontology terms on chromosomes 21 and 22, we detected some over-represented molecular functions and biological processes among these differentially replicated genes, such as the ones relevant to hydrolase, transferase and receptor-binding activities. Some of these results were confirmed showing >70% consistency with cDNA microarray data that were independently generated in parallel with the tiling arrays. Thus, our improved analysis approaches specifically designed for high-density tiling array data enabled us to reliably and sensitively identify many novel temporal replication patterns on human chromosomes

    TileProbe: modeling tiling array probe effects using publicly available data

    Get PDF
    Motivation: Individual probes on an Affymetrix tiling array usually behave differently. Modeling and removing these probe effects are critical for detecting signals from the array data. Current data processing techniques either require control samples or use probe sequences to model probe-specific variability, such as with MAT. Although the MAT approach can be applied without control samples, residual probe effects continue to distort the true biological signals