2,820 research outputs found
MRFalign: Protein Homology Detection through Alignment of Markov Random Fields
Sequence-based protein homology detection has been extensively studied and so
far the most sensitive method is based upon comparison of protein sequence
profiles, which are derived from multiple sequence alignment (MSA) of sequence
homologs in a protein family. A sequence profile is usually represented as a
position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and
accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This
paper presents a new homology detection method MRFalign, consisting of three
key components: 1) a Markov Random Fields (MRF) representation of a protein
family; 2) a scoring function measuring similarity of two MRFs; and 3) an
efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning
two MRFs. Compared to HMM that can only model very short-range residue
correlation, MRFs can model long-range residue interaction pattern and thus,
encode information for the global 3D structure of a protein family.
Consequently, MRF-MRF comparison for remote homology detection shall be much
more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that
MRFalign outperforms several popular HMM or PSSM-based methods in terms of both
alignment accuracy and remote homology detection and that MRFalign works
particularly well for mainly beta proteins. For example, tested on the
benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM
succeed on 48% and 52% of proteins, respectively, at superfamily level, and on
15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign
succeeds on 57.3% and 42.5% of proteins at superfamily and fold level,
respectively. This study implies that long-range residue interaction patterns
are very helpful for sequence-based homology detection. The software is
available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Sequential Monte Carlo Methods for Protein Folding
We describe a class of growth algorithms for finding low energy states of
heteropolymers. These polymers form toy models for proteins, and the hope is
that similar methods will ultimately be useful for finding native states of
real proteins from heuristic or a priori determined force fields. These
algorithms share with standard Markov chain Monte Carlo methods that they
generate Gibbs-Boltzmann distributions, but they are not based on the strategy
that this distribution is obtained as stationary state of a suitably
constructed Markov chain. Rather, they are based on growing the polymer by
successively adding individual particles, guiding the growth towards
configurations with lower energies, and using "population control" to eliminate
bad configurations and increase the number of "good ones". This is not done via
a breadth-first implementation as in genetic algorithms, but depth-first via
recursive backtracking. As seen from various benchmark tests, the resulting
algorithms are extremely efficient for lattice models, and are still
competitive with other methods for simple off-lattice models.Comment: 10 pages; published in NIC Symposium 2004, eds. D. Wolf et al. (NIC,
Juelich, 2004
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Inference algorithms for gene networks: a statistical mechanics analysis
The inference of gene regulatory networks from high throughput gene
expression data is one of the major challenges in systems biology. This paper
aims at analysing and comparing two different algorithmic approaches. The first
approach uses pairwise correlations between regulated and regulating genes; the
second one uses message-passing techniques for inferring activating and
inhibiting regulatory interactions. The performance of these two algorithms can
be analysed theoretically on well-defined test sets, using tools from the
statistical physics of disordered systems like the replica method. We find that
the second algorithm outperforms the first one since it takes into account
collective effects of multiple regulators
- …