6,955 research outputs found
Recommended from our members
EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.
The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns
Gene ranking and biomarker discovery under correlation
Biomarker discovery and gene ranking is a standard task in genomic high
throughput analysis. Typically, the ordering of markers is based on a
stabilized variant of the t-score, such as the moderated t or the SAM
statistic. However, these procedures ignore gene-gene correlations, which may
have a profound impact on the gene orderings and on the power of the subsequent
tests.
We propose a simple procedure that adjusts gene-wise t-statistics to take
account of correlations among genes. The resulting correlation-adjusted
t-scores ("cat" scores) are derived from a predictive perspective, i.e. as a
score for variable selection to discriminate group membership in two-class
linear discriminant analysis. In the absence of correlation the cat score
reduces to the standard t-score. Moreover, using the cat score it is
straightforward to evaluate groups of features (i.e. gene sets). For
computation of the cat score from small sample data we propose a shrinkage
procedure. In a comparative study comprising six different synthetic and
empirical correlation structures we show that the cat score improves estimation
of gene orderings and leads to higher power for fixed true discovery rate, and
vice versa. Finally, we also illustrate the cat score by analyzing metabolomic
data.
The shrinkage cat score is implemented in the R package "st" available from
URL http://cran.r-project.org/web/packages/st/Comment: 18 pages, 5 figures, 1 tabl
An Algorithm for Cellular Reprogramming
The day we understand the time evolution of subcellular elements at a level
of detail comparable to physical systems governed by Newton's laws of motion
seems far away. Even so, quantitative approaches to cellular dynamics add to
our understanding of cell biology, providing data-guided frameworks that allow
us to develop better predictions about and methods for control over specific
biological processes and system-wide cell behavior. In this paper we describe
an approach to optimizing the use of transcription factors in the context of
cellular reprogramming. We construct an approximate model for the natural
evolution of a synchronized population of fibroblasts, based on data obtained
by sampling the expression of some 22,083 genes at several times along the cell
cycle. (These data are based on a colony of cells that have been cell cycle
synchronized) In order to arrive at a model of moderate complexity, we cluster
gene expression based on the division of the genome into topologically
associating domains (TADs) and then model the dynamics of the expression levels
of the TADs. Based on this dynamical model and known bioinformatics, we develop
a methodology for identifying the transcription factors that are the most
likely to be effective toward a specific cellular reprogramming task. The
approach used is based on a device commonly used in optimal control. From this
data-guided methodology, we identify a number of validated transcription
factors used in reprogramming and/or natural differentiation. Our findings
highlight the immense potential of dynamical models models, mathematics, and
data guided methodologies for improving methods for control over biological
processes
Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies
Motivated by examples from genetic association studies, this paper considers
the model selection problem in a general complex linear model system and in a
Bayesian framework. We discuss formulating model selection problems and
incorporating context-dependent {\it a priori} information through different
levels of prior specifications. We also derive analytic Bayes factors and their
approximations to facilitate model selection and discuss their theoretical and
computational properties. We demonstrate our Bayesian approach based on an
implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real
data application of mapping tissue-specific eQTLs. Our novel results on Bayes
factors provide a general framework to perform efficient model comparisons in
complex linear model systems
A graph-based representation of Gene Expression profiles in DNA microarrays
This paper proposes a new and very flexible data model, called gene expression graph (GEG), for genes expression analysis and classification. Three features differentiate GEGs from other available microarray data representation structures: (i) the memory occupation of a GEG is independent of the number of samples used to built it; (ii) a GEG more clearly expresses relationships among expressed and non expressed genes in both healthy and diseased tissues experiments; (iii) GEGs allow to easily implement very efficient classifiers. The paper also presents a simple classifier for sample-based classification to show the flexibility and user-friendliness of the proposed data structur
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
Applications of Biological Cell Models in Robotics
In this paper I present some of the most representative biological models
applied to robotics. In particular, this work represents a survey of some
models inspired, or making use of concepts, by gene regulatory networks (GRNs):
these networks describe the complex interactions that affect gene expression
and, consequently, cell behaviour
- …