Search CORE

41 research outputs found

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework

Author: A Beyer
A Gelman
EH Hurowitz
HH Thygesen
Hong Qin
J Colinge
JS Morris
K Dolinski
KA Baggerly
L Cai
L David
L Zhang
M Harbers
MD Stern
Michael A Gilchrist
Russell Zaretzki
RZN Vencio
RZN Vencio
S Audic
SL Madden
T Beissbarth
VA Kuznetsov
VE Velculescu
VE Velculescu
VR Akmaev
Wolfram Research Inc
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. Results Using the yeast <it>Saccharomyces cerevisiae </it>as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. Conclusion With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.</p

University of Tennessee, Knoxville: Trace

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An analysis of single amino acid repeats as use case for application specific background models

Author: C Notredame
David P Kreil
DP Depledge
DP Kreil
E Birney
E Delot
EL Sonnhammer
EM Marcotte
G Gouridis
G Nuel
G Reinert
H Gerber
H Nielsen
H Nielsen
IB Kuznetsov
J Thompson
J Wootton
J Xie
JD Bendtsen
JM Hancock
JW Fondon
L Brown
L Zhang
M Hoebeke
M Mar Alba
M Thomas-Chollier
M Tipping
M Tipping
MA Huntley
O Weiss
OB Ptitsyn
P Siwach
P Siwach
Paweł P Łabaj
Peter Sykacek
PP Łabaj
R Lopez
R Lyne
RI Sadreyev
RS Hegde
S Caburet
S Hands
S Henikoff
S Karlin
S Karlin
SF Altschul
SF Altschul
SF Altschul
T Koestler
VJ Promponas
VR Chechetkin
VS Pande
WR Pearson
Y Kashi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

Warwick Research Archives Portal Repository