58,589 research outputs found
Feature selection environment for genomic applications
<p>Abstract</p> <p/
Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies
Motivated by examples from genetic association studies, this paper considers
the model selection problem in a general complex linear model system and in a
Bayesian framework. We discuss formulating model selection problems and
incorporating context-dependent {\it a priori} information through different
levels of prior specifications. We also derive analytic Bayes factors and their
approximations to facilitate model selection and discuss their theoretical and
computational properties. We demonstrate our Bayesian approach based on an
implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real
data application of mapping tissue-specific eQTLs. Our novel results on Bayes
factors provide a general framework to perform efficient model comparisons in
complex linear model systems
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution
One of the important questions in biological evolution is to know if certain
changes along protein coding genes have contributed to the adaptation of
species. This problem is known to be biologically complex and computationally
very expensive. It, therefore, requires efficient Grid or cluster solutions to
overcome the computational challenge. We have developed a Grid-enabled tool
(gcodeml) that relies on the PAML (codeml) package to help analyse large
phylogenetic datasets on both Grids and computational clusters. Although we
report on results for gcodeml, our approach is applicable and customisable to
related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con
MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification
Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods
- …