Search CORE

2,820 research outputs found

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

Author: Ma Jianzhu
Wang Sheng
Wang Zhiyong
Xu Jinbo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Gene-network inference by message passing

Author: A Braunstein
A Pagnani
Alberts B
Braunstein A
Butte A J
Gasch A P
Kabashima Y
M Weigt
Murphy K
R Zecchina
Segal E
Publication venue: 'IOP Publishing'
Publication date: 01/01/2008
Field of study

The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics 2007, Kyot

arXiv.org e-Print Archive

CiteSeerX

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Sequential Monte Carlo Methods for Protein Folding

Author: Grassberger Peter
Publication venue
Publication date: 01/01/2004
Field of study

We describe a class of growth algorithms for finding low energy states of heteropolymers. These polymers form toy models for proteins, and the hope is that similar methods will ultimately be useful for finding native states of real proteins from heuristic or a priori determined force fields. These algorithms share with standard Markov chain Monte Carlo methods that they generate Gibbs-Boltzmann distributions, but they are not based on the strategy that this distribution is obtained as stationary state of a suitably constructed Markov chain. Rather, they are based on growing the polymer by successively adding individual particles, guiding the growth towards configurations with lower energies, and using "population control" to eliminate bad configurations and increase the number of "good ones". This is not done via a breadth-first implementation as in genetic algorithms, but depth-first via recursive backtracking. As seen from various benchmark tests, the resulting algorithms are extremely efficient for lattice models, and are still competitive with other methods for simple off-lattice models.Comment: 10 pages; published in NIC Symposium 2004, eds. D. Wolf et al. (NIC, Juelich, 2004

arXiv.org e-Print Archive

CiteSeerX

Juelich Shared Electronic Resources

A Survey on Metric Learning for Feature Vectors and Structured Data

Author: Bellet Aurélien
Habrard Amaury
Sebban Marc
Publication venue
Publication date: 01/01/2013
Field of study

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

arXiv.org e-Print Archive

HAL-UJM

The inference of gene regulatory networks from high throughput gene expression data is one of the major challenges in systems biology. This paper aims at analysing and comparing two different algorithmic approaches. The first approach uses pairwise correlations between regulated and regulating genes; the second one uses message-passing techniques for inferring activating and inhibiting regulatory interactions. The performance of these two algorithms can be analysed theoretically on well-defined test sets, using tools from the statistical physics of disordered systems like the replica method. We find that the second algorithm outperforms the first one since it takes into account collective effects of multiple regulators

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Spiral search: a hydrophobic-core directed local search for simplified PSP on 3D FCC lattice

Author: AA Tantar
Abdul Sattar
Adam Smith
AL Patton
B Berger
C Blum
C Levinthal
C Rohl
C Thachuk
CB Anfinsen
CM Dobso
Duc Nghia Pham
F Glover
F Glover
GW Klau
HJ Böckenhauer
I Dotu
J Lee
K Yue
K Yue
KA Dill
KF Lau
M Cebrián
MA Hakim Newton
MA Rashid
Mahmood A Rashid
Md Tamjidul Hoque
MT Hoque
MT Hoque
MT Hoque
MT Hoque
N Lesh
R Bonneau
R Unger
S Shatabda
Swakkhar Shatabda
T Hales
The Science Editorial
V Cutello
Y Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref