1,437 research outputs found

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

    Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays

    Get PDF
    Not until recently have custom made high-density oligonucleotide microarrays been available at an affordable price. The aim of this thesis was to design microarrays and analysis algorithms for DNA repair and DNA damage detection, and to apply the methods in real experiments. Thomassen et al. have used their custom designed whole genome-tiling microarrays for detection of transcriptional changes in Escherichia coli after exposure to DNA damageing reagents. The transcriptional changes in E. coli treated with UV light or the methylating reagent MNNG were shown to be larger and to include far more genes than previously reported. To optimize the data analysis for the custom made arrays, Thomassen and coworkers designed their own normalization and analysis algorithms, and showed these more suitable than established methods that are currently applied on custom tiling arrays. Among other findings several novel stress-induced transcripts were detected, of which one is predicted to be a UV-induced short transmembrane protein. Additionally, no upregulation of the previously described UV-inducible aidB is shown. In the MNNG study several genes are shown as downregulated in response to DNA damage although having upstream regulatory sequences similar to the established LexA box A and B. This indicates that the LexA regulon also might control gene repression and that the box A and B sequence can not alone answer for the LexA controlled gene regulation. Thomassen et al. have also custom designed a microarray for oncogenic fusion gene detection. Cancer specific fusion genes are often used to subgroup cancers and to define the optimal treatment, but currently the laboratory detection procedure is both laborious and tedious. In a blinded study on six cancer cell lines proof of principle was shown by detection of six out of six positive controls. The design and analysis methods for this microarray are now being refined to make a diagnostic fusion gene detection tool

    Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

    Full text link
    Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased mapping of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This paper presents a doubly stochastic latent variable analysis method for transcript discovery and protein binding region localization using tiling array data. This model is unique in that it considers actual genomic distance between probes. Additionally, the model is designed to be robust to cross-hybridized and nonresponsive probes, which can often lead to false-positive results in microarray experiments. We apply our model to a transcript finding data set to illustrate the consistency of our method. Additionally, we apply our method to a spike-in experiment that can be used as a benchmark data set for researchers interested in developing and comparing future tiling array methods. The results indicate that our method is very powerful, accurate and can be used on a single sample and without control experiments, thus defraying some of the overhead cost of conducting experiments on tiling arrays.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS248 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    STATISTICAL METHODS FOR AFFYMETRIX TILING ARRAY DATA

    Get PDF
    Tiling arrays are a microarray technology currently being used for a variety of genomic and epigenomic applications, such as the mapping of transcription, DNA methylation, and histone modifications. Tiling arrays provide high-density coverage of a genome, or a genomic region, through the systematic and sequential placement of probes without regard to genome annotation. In this paper we compare the Affymetrix tiling array to the Affymetrix GeneChip® 3’ expression array and propose methods that address statistical and bioinformatic issues that accompany gene expression data that are generated from Affymetrix tiling arrays. Real data from the model organism Arabidopsis thaliana motivate this work and application

    GermOnline 4.0 is a genomics gateway for germline development, meiosis and the mitotic cell cycle

    Get PDF
    GermOnline 4.0 is a cross-species database portal focusing on high-throughput expression data relevant for germline development, the meiotic cell cycle and mitosis in healthy versus malignant cells. It is thus a source of information for life scientists as well as clinicians who are interested in gene expression and regulatory networks. The GermOnline gateway provides unlimited access to information produced with high-density oligonucleotide microarrays (3′-UTR GeneChips), genome-wide protein–DNA binding assays and protein–protein interaction studies in the context of Ensembl genome annotation. Samples used to produce high-throughput expression data and to carry out genome-wide in vivo DNA binding assays are annotated via the MIAME-compliant Multiomics Information Management and Annotation System (MIMAS 3.0). Furthermore, the Saccharomyces Genomics Viewer (SGV) was developed and integrated into the gateway. SGV is a visualization tool that outputs genome annotation and DNA-strand specific expression data produced with high-density oligonucleotide tiling microarrays (Sc_tlg GeneChips) which cover the complete budding yeast genome on both DNA strands. It facilitates the interpretation of expression levels and transcript structures determined for various cell types cultured under different growth and differentiation conditions

    Discovering Regulatory Overlapping RNA Transcripts

    Get PDF
    STEREO is a novel algorithm that discovers cis-regulatory RNA interactions by assembling complete and potentially overlapping same-strand RNA transcripts from tiling expression data. STEREO first identifies coherent segments of transcription and then discovers individual transcripts that are consistent with the observed segments given intensity and shape constraints. We used STEREO to identify 1446 regions of overlapping transcription in two strains of yeast, including transcripts that comprise a new form of molecular toggle switch that controls gene variegation

    Experimental annotation of the human pathogen Histoplasma capsulatum transcribed regions using high-resolution tiling arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The fungal pathogen <it>Histoplasma capsulatum </it>is thought to be the most common cause of fungal respiratory infections in immunocompetent humans, yet little is known about its biology. Here we provide the first genome-wide studies to experimentally validate its genome annotation. A functional interrogation of the <it>Histoplasma </it>genome provides critical support for continued investigation into the biology and pathogenesis of <it>H. capsulatum </it>and related fungi.</p> <p>Results</p> <p>We employed a three-pronged approach to provide a functional annotation for the <it>H. capsulatum </it>G217B strain. First, we probed high-density tiling arrays with labeled cDNAs from cells grown under diverse conditions. These data defined 6,172 transcriptionally active regions (TARs), providing validation of 6,008 gene predictions. Interestingly, 22% of these predictions showed evidence of anti-sense transcription. Additionally, we detected transcription of 264 novel genes not present in the original gene predictions. To further enrich our analysis, we incorporated expression data from whole-genome oligonucleotide microarrays. These expression data included profiling under growth conditions that were not represented in the tiling experiment, and validated an additional 2,249 gene predictions. Finally, we compared the G217B gene predictions to other available fungal genomes, and observed that an additional 254 gene predictions had an ortholog in a different fungal species, suggesting that they represent genuine coding sequences.</p> <p>Conclusions</p> <p>These analyses yielded a high confidence set of validated gene predictions for <it>H. capsulatum</it>. The transcript sets resulting from this study are a valuable resource for further experimental characterization of this ubiquitous fungal pathogen. The data is available for interactive exploration at <url>http://histo.ucsf.edu</url>.</p

    Exploring the transcriptional landscape of plant circadian rhythms using genome tiling arrays

    No full text
    BACKGROUND Organisms are able to anticipate changes in the daily environment with an internal oscillator know as the circadian clock. Transcription is an important mechanism in maintaining these oscillations. Here we explore, using whole genome tiling arrays, the extent of rhythmic expression patterns genome-wide, with an unbiased analysis of coding and noncoding regions of the Arabidopsis genome. RESULTS As in previous studies, we detected a circadian rhythm for approximately 25% of the protein coding genes in the genome. With an unbiased interrogation of the genome, extensive rhythmic introns were detected predominantly in phase with adjacent rhythmic exons, creating a transcript that, if translated, would be expected to produce a truncated protein. In some cases, such as the MYB transcription factor AT2G20400, an intron was found to exhibit a circadian rhythm while the remainder of the transcript was otherwise arrhythmic. In addition to several known noncoding transcripts, including microRNA, trans-acting short interfering RNA, and small nucleolar RNA, greater than one thousand intergenic regions were detected as circadian clock regulated, many of which have no predicted function, either coding or noncoding. Nearly 7% of the protein coding genes produced rhythmic antisense transcripts, often for genes whose sense strand was not similarly rhythmic. CONCLUSIONS This study revealed widespread circadian clock regulation of the Arabidopsis genome extending well beyond the protein coding transcripts measured to date. This suggests a greater level of structural and temporal dynamics than previously known

    Tilescope: online analysis pipeline for high-density tiling microarray data

    Get PDF
    Tilescope is a fully integrated and automated new data-processing pipeline for analyzing high-density tiling-array data
    • …
    corecore