Search CORE

414 research outputs found

Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

Author: Johnson W. Evan
Liu Jun S.
Liu X. Shirley
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased mapping of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This paper presents a doubly stochastic latent variable analysis method for transcript discovery and protein binding region localization using tiling array data. This model is unique in that it considers actual genomic distance between probes. Additionally, the model is designed to be robust to cross-hybridized and nonresponsive probes, which can often lead to false-positive results in microarray experiments. We apply our model to a transcript finding data set to illustrate the consistency of our method. Additionally, we apply our method to a spike-in experiment that can be used as a benchmark data set for researchers interested in developing and comparing future tiling array methods. The results indicate that our method is very powerful, accurate and can be used on a single sample and without control experiments, thus defraying some of the overhead cost of conducting experiments on tiling arrays.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS248 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

Author: Aubourg Sébastien
Brunaud Véronique
Bérard Caroline
Martin-Magniette Marie-Laure
Robin Stéphane
Publication venue
Publication date: 01/01/2011
Field of study

Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

Discovering Regulatory Overlapping RNA Transcripts

Author: A.E. Urban
B. Wilhelm
B. Wilhelm
C. Hongay
F. Miura
F. Picard
I. Martianov
J. Camblong
J. Marioni
J.A. Martens
L. David
S.L. Bumgarner
T. Royce
T.A. Hughes
U. Nagalakshmi
W. Huber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

STEREO is a novel algorithm that discovers cis-regulatory RNA interactions by assembling complete and potentially overlapping same-strand RNA transcripts from tiling expression data. STEREO first identifies coherent segments of transcription and then discovers individual transcripts that are consistent with the observed segments given intensity and shape constraints. We used STEREO to identify 1446 regions of overlapping transcription in two strains of yeast, including transcripts that comprise a new form of molecular toggle switch that controls gene variegation

DSpace@MIT

Crossref

Transcriptional landscape estimation from tiling array data using a model of signal shift and drift

Author: Bessières Philippe
Jarmer Hanne
Leduc Aurélie
Nicolas Pierre
Rasmussen Simon
Robin Stéphane
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints

PubMed Central

HAL Descartes

Online Research Database In Technology

Hal-Diderot

WAVELET BASED FUNCTIONAL MODELS FOR TRANSCRIPTOME ANALYSIS WITH TILING ARRAYS

Author: Clement Lieven
Crainiceanu Ciprian
DeBeuf Kristof
Irizarry Rafael
Thas Olivier
Vuylsteke Marnik
Publication venue: Collection of Biostatistics Research Archive
Publication date: 03/02/2010
Field of study

For a better understanding of the biology of an organism a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays can be used for this purpose. Such arrays allow the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. Much of the initial methodological efforts were designed for transcript discovery, while more recent developments also focus on differential expression. To our knowledge no methods for tiling arrays are described in the literature that can both assess transcript discovery and identify differentially expressed transcripts, simultaneously. The wavelet based functional model developed in this paper is designed to fill this methodological void. As opposed to existing methods, our statistical framework also permits a natural integration of preprocessing into the standard statistical analysis flow of tiling array data. We use Johnson transformations, which are based on cumulants, for computing false discovery rates (FDRs) and Bayesian credibility intervals for the estimates of the effect functions within the data space. A case study illustrates that our model is well suited for a simultaneous assessment of transcript discovery and differential expression, while remaining competitive with methods that perform only one of these tasks

Collection Of Biostatistics Research Archive

Ratio-Based Analysis of Differential mRNA Processing and Expression of a Polyadenylation Factor Mutant pcfs4 Using Arabidopsis Tiling Microarray

Author: A Matsui
AG Hunt
B Thomas
B Tian
BC Meyers
BM Bolstad
C Martinez
C Mayr
CS Lutz
D Xing
D Xing
D Xing
D Xing
DD Licatalosi
Denghui Xing
Diana M. Kroll
ET Wang
G Zeller
Guoli Ji
H Ji
HJ Kim
HR Chung
J Kurepa
JF Box
Jianti Zheng
JT Judy
K Ryan
L David
L Li
L Li
M Ares Jr
MJ Moore
MW Jones-Rhoades
N Naouar
NJ Proudfoot
P Mas
Q Zheng
Qingshun Quinn Li
R Macknight
R Macknight
RA Irizarry
Raya Khanin
S Ghosh
S Kaneko
S Laubinger
S West
S Zheng
SG Kim
T Imaizumi
T Toyoda
TC Mockler
TE Royce
V Stolc
VG Tusher
W Huber
W Huber
X Zhang
X Zhang
Xiaohui Wu
Y Kurihara
Y Shen
Y Shi
Yingjia Shen
Z Ji
Z Zhang
Z Zhang
ZD Zhang
Publication venue: Public Library of Science
Publication date: 01/02/2011
Field of study

US National Institutes of Health [1R15GM07719201A1]; US National Science Foundation [IOS-0817818]; Ohio Plant Biotech Consortium; National Natural Science Foundation of China [60774033]; Specialized Research Fund for the Doctoral Program of Higher EducatiBackground: Alternative polyadenylation as a mechanism in gene expression regulation has been widely recognized in recent years. Arabidopsis polyadenylation factor PCFS4 was shown to function in leaf development and in flowering time control. The function of PCFS4 in controlling flowering time was correlated with the alternative polyadenylation of FCA, a flowering time regulator. However, genetic evidence suggested additional targets of PCFS4 that may mediate its function in both flowering time and leaf development. Methodology/Principal Findings: To identify further targets, we investigated the whole transcriptome of a PCFS4 mutant using Affymetrix Arabidopsis genomic tiling 1.0R array and developed a data analysis pipeline, termed RADPRE (Ratio-based Analysis of Differential mRNA Processing and Expression). In RADPRE, ratios of normalized probe intensities between wild type Columbia and a pcfs4 mutant were first generated. By doing so, one of the major problems of tiling array data-variations caused by differential probe affinity-was significantly alleviated. With the probe ratios as inputs, a hierarchy of statistical tests was carried out to identify differentially processed genes (DPG) and differentially expressed genes (DEG). The false discovery rate (FDR) of this analysis was estimated by using the balanced random combinations of Col/pcfs4 and pcfs4/Col ratios as inputs. Gene Ontology (GO) analysis of the DPGs and DEGs revealed potential new roles of PCFS4 in stress responses besides flowering time regulation. Conclusion/Significance: We identified 68 DPGs and 114 DEGs with FDR at 1% and 2%, respectively. Most of the 68 DPGs were subjected to alternative polyadenylation, splicing or transcription initiation. Quantitative PCR analysis of a set of DPGs confirmed that most of these genes were truly differentially processed in pcfs4 mutant plants. The enriched GO term "regulation of flower development'' among PCFS4 targets further indicated the efficacy of the RADPRE pipeline. This simple but effective program is available upon request

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Xiamen University Institutional Repository

At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

Rank-statistics based enrichment-site prediction algorithm developed for chromatin immunoprecipitation on chip experiments

Author: Ghosh S.
Gingeras T. R.
Hirsch H. A.
Sekinger E.
Struhl K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2006
Field of study

Background: High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. Tiling arrays are increasingly used in chromatin immunoprecipitation (IP) experiments (ChIP on chip). ChIP on chip facilitates the generation of genome-wide maps of in-vivo interactions between DNA-associated proteins including transcription factors and DNA. Analysis of the hybridization of an immunoprecipitated sample to a tiling array facilitates the identification of ChIP-enriched segments of the genome. These enriched segments are putative targets of antibody assayable regulatory elements. The enrichment response is not ubiquitous across the genome. Typically 5 to 10% of tiled probes manifest some significant enrichment. Depending upon the factor being studied, this response can drop to less than 1%. The detection and assessment of significance for interactions that emanate from non-canonical and/or un-annotated regions of the genome is especially challenging. This is the motivation behind the proposed algorithm. Results: We have proposed a novel rank and replicate statistics-based methodology for identifying and ascribing statistical confidence to regions of ChIP-enrichment. The algorithm is optimized for identification of sites that manifest low levels of enrichment but are true positives, as validated by alternative biochemical experiments. Although the method is described here in the context of ChIP on chip experiments, it can be generalized to any treatment-control experimental design. The results of the algorithm show a high degree of concordance with independent biochemical validation methods. The sensitivity and specificity of the algorithm have been characterized via quantitative PCR and independent computational approaches. Conclusion: The algorithm ranks all enrichment sites based on their intra-replicate ranks and inter-replicate rank consistency. Following the ranking, the method allows segmentation of sites based on a meta p-value, a composite array signal enrichment criterion, or a composite of these two measures. The sensitivities obtained subsequent to the segmentation of data using a meta p-value of 10(-5), an array signal enrichment of 0.2 and a composite of these two values are 88%, 87% and 95%, respectively

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Novel Low Abundance and Transient RNAs in Yeast Revealed by Tiling Microarrays and Ultra High–Throughput Sequencing Are Not Conserved Across Closely Related Yeast Species

Author: Bullard James
Dudoit Sandrine
Hansen Kasper Daniel
Lee Albert
Sherlock Gavin
Publication venue: Public Library of Science
Publication date: 01/12/2008
Field of study

A complete description of the transcriptome of an organism is crucial for a comprehensive understanding of how it functions and how its transcriptional networks are controlled, and may provide insights into the organism's evolution. Despite the status of Saccharomyces cerevisiae as arguably the most well-studied model eukaryote, we still do not have a full catalog or understanding of all its genes. In order to interrogate the transcriptome of S. cerevisiae for low abundance or rapidly turned over transcripts, we deleted elements of the RNA degradation machinery with the goal of preferentially increasing the relative abundance of such transcripts. We then used high-resolution tiling microarrays and ultra high–throughput sequencing (UHTS) to identify, map, and validate unannotated transcripts that are more abundant in the RNA degradation mutants relative to wild-type cells. We identified 365 currently unannotated transcripts, the majority presumably representing low abundance or short-lived RNAs, of which 185 are previously unknown and unique to this study. It is likely that many of these are cryptic unstable transcripts (CUTs), which are rapidly degraded and whose function(s) within the cell are still unclear, while others may be novel functional transcripts. Of the 185 transcripts we identified as novel to our study, greater than 80 percent come from regions of the genome that have lower conservation scores amongst closely related yeast species than 85 percent of the verified ORFs in S. cerevisiae. Such regions of the genome have typically been less well-studied, and by definition transcripts from these regions will distinguish S. cerevisiae from these closely related species

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Functional characterization and annotation of trait-associated genomic regions by transcriptome analysis

Author: Du Yang (gnd: 1062825780)
Publication venue: Universität Rostock Rostock
Publication date: 01/01/2014
Field of study

In this work, two novel implementations have been presented, which could assist in the design and data analysis of high-throughput genomic experiments. An efficient and flexible tiling probe selection pipeline utilizing the penalized uniqueness score has been implemented, which could be employed in the design of various types and scales of genome tiling task. A novel hidden semi-Markov model (HSMM) implementation is made available within the Bioconductor project, which provides a unified interface for segmenting genomic data in a wide range of research subjects.In dieser Arbeit werden zwei neuartige Implementierungen präsentiert, die im Design und in der Datenanalyse von genomischen Hochdurchsatz-Experiment hilfreich sein könnten. Die erste Implementierung bildet eine effiziente und flexible Auswahl-Pipeline für Tiling-Proben, basierend auf einem Eindeutigkeitsmaß mit einer Maluswertung. Als zweite Implementierung wurde ein neuartiges Hidden-Semi-Markov-Modell (HSMM) im Bioconductor Projekt verfügbar gemacht

Rostocker Dokumentenserver