10,544 research outputs found
Gene set bagging for estimating replicability of gene set analyses
Background: Significance analysis plays a major role in identifying and
ranking genes, transcription factor binding sites, DNA methylation regions, and
other high-throughput features for association with disease. We propose a new
approach, called gene set bagging, for measuring the stability of ranking
procedures using predefined gene sets. Gene set bagging involves resampling the
original high-throughput data, performing gene-set analysis on the resampled
data, and confirming that biological categories replicate. This procedure can
be thought of as bootstrapping gene-set analysis and can be used to determine
which are the most reproducible gene sets. Results: Here we apply this approach
to two common genomics applications: gene expression and DNA methylation. Even
with state-of-the-art statistical ranking procedures, significant categories in
a gene set enrichment analysis may be unstable when subjected to resampling.
Conclusions: We demonstrate that gene lists are not necessarily stable, and
therefore additional steps like gene set bagging can improve biological
inference of gene set analysis.Comment: 3 Figure
Inferring evolutionary histories of pathway regulation from transcriptional profiling data
One of the outstanding challenges in comparative genomics is to interpret the
evolutionary importance of regulatory variation between species. Rigorous
molecular evolution-based methods to infer evidence for natural selection from
expression data are at a premium in the field, and to date, phylogenetic
approaches have not been well-suited to address the question in the small sets
of taxa profiled in standard surveys of gene expression. We have developed a
strategy to infer evolutionary histories from expression profiles by analyzing
suites of genes of common function. In a manner conceptually similar to
molecular evolution models in which the evolutionary rates of DNA sequence at
multiple loci follow a gamma distribution, we modeled expression of the genes
of an \emph{a priori}-defined pathway with rates drawn from an inverse gamma
distribution. We then developed a fitting strategy to infer the parameters of
this distribution from expression measurements, and to identify gene groups
whose expression patterns were consistent with evolutionary constraint or rapid
evolution in particular species. Simulations confirmed the power and accuracy
of our inference method. As an experimental testbed for our approach, we
generated and analyzed transcriptional profiles of four \emph{Saccharomyces}
yeasts. The results revealed pathways with signatures of constrained and
accelerated regulatory evolution in individual yeasts and across the phylogeny,
highlighting the prevalence of pathway-level expression change during the
divergence of yeast species. We anticipate that our pathway-based phylogenetic
approach will be of broad utility in the search to understand the evolutionary
relevance of regulatory change.Comment: 30 pages, 12 figures, 2 tables, contact authors for supplementary
table
Breaking the paradigm: Dr Insight empowers signature-free, enhanced drug repurposing
Motivation: Transcriptome-based computational drug repurposing has attracted considerable interest by bringing about faster and more cost-effective drug discovery. Nevertheless, key limitations of the current drug connectivity-mapping paradigm have been long overlooked, including the lack of effective means to determine optimal query gene signatures. Results: The novel approach Dr Insight implements a frame-breaking statistical model for the âhand-shakeâ between disease and drug data. The genome-wide screening of concordantly expressed genes (CEGs) eliminates the need for subjective selection of query signatures, added to eliciting better proxy for potential disease-specific drug targets. Extensive comparisons on simulated and real cancer datasets have validated the superior performance of Dr Insight over several popular drug-repurposing methods to detect known cancer drugs and drugâtarget interactions. A proof-of-concept trial using the TCGA breast cancer dataset demonstrates the application of Dr Insight for a comprehensive analysis, from redirection of drug therapies, to a systematic construction of disease-specific drug-target networks
TinkerCell: Modular CAD Tool for Synthetic Biology
Synthetic biology brings together concepts and techniques from engineering
and biology. In this field, computer-aided design (CAD) is necessary in order
to bridge the gap between computational modeling and biological data. An
application named TinkerCell has been created in order to serve as a CAD tool
for synthetic biology. TinkerCell is a visual modeling tool that supports a
hierarchy of biological parts. Each part in this hierarchy consists of a set of
attributes that define the part, such as sequence or rate constants. Models
that are constructed using these parts can be analyzed using various C and
Python programs that are hosted by TinkerCell via an extensive C and Python
API. TinkerCell supports the notion of a module, which are networks with
interfaces. Such modules can be connected to each other, forming larger modular
networks. Because TinkerCell associates parameters and equations in a model
with their respective part, parts can be loaded from databases along with their
parameters and rate equations. The modular network design can be used to
exchange modules as well as test the concept of modularity in biological
systems. The flexible modeling framework along with the C and Python API allows
TinkerCell to serve as a host to numerous third-party algorithms. TinkerCell is
a free and open-source project under the Berkeley Software Distribution
license. Downloads, documentation, and tutorials are available at
www.tinkercell.com.Comment: 23 pages, 20 figure
BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models
Background: Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the
biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in
the use of models as well as the development of improved software systems and the availability of better, cheaper
computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model
repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in
these repositories should be extensively tested and encoded in community-supported and standardised formats. In
addition, the models and their components should be cross-referenced with other resources in order to allow their
unambiguous identification.
Description: BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a
freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative
models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by
BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled
vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various
formats. Reaction network diagrams generated from the models are also available in several formats. BioModels
Database also provides features such as online simulation and the extraction of components from large scale models
into smaller submodels. Finally, the system provides a range of web services that external software systems can use to
access up-to-date data from the database.
Conclusions: BioModels Database has become a recognised reference resource for systems biology. It is being used by
the community in a variety of ways; for example, it is used to benchmark different simulation systems, and to study the
clustering of models based upon their annotations. Model deposition to the database today is advised by several
publishers of scientific journals. The models in BioModels Database are freely distributed and reusable; the underlying
software infrastructure is also available from SourceForge https://sourceforge.net/projects/biomodels/ under the GNU
General Public License
- âŚ