Search CORE

80,244 research outputs found

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Kinetic distance and kinetic maps from molecular dynamics simulation

Author: Bittracher A.
Bowman G. R.
Cecilia Clementi
Frank Noé
Nadler B.
Trendelkamp-Schroer B.
Publication venue
Publication date: 01/01/2015
Field of study

Characterizing macromolecular kinetics from molecular dynamics (MD) simulations requires a distance metric that can distinguish slowly-interconverting states. Here we build upon diffusion map theory and define a kinetic distance for irreducible Markov processes that quantifies how slowly molecular conformations interconvert. The kinetic distance can be computed given a model that approximates the eigenvalues and eigenvectors (reaction coordinates) of the MD Markov operator. Here we employ the time-lagged independent component analysis (TICA). The TICA components can be scaled to provide a kinetic map in which the Euclidean distance corresponds to the kinetic distance. As a result, the question of how many TICA dimensions should be kept in a dimensionality reduction approach becomes obsolete, and one parameter less needs to be specified in the kinetic model construction. We demonstrate the approach using TICA and Markov state model (MSM) analyses for illustrative models, protein conformation dynamics in bovine pancreatic trypsin inhibitor and protein-inhibitor association in trypsin and benzamidine

arXiv.org e-Print Archive

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

A methodology for determining amino-acid substitution matrices from set covers

Author: A. Bahr
A.D. McLachlan
D.F. Feng
G. Vogt
G.H. Gonnet
J. Setubal
J.D. Blake
J.K.M. Rao
M. Gribskov
M.F. Sagot
R.B. Russell
R.E. Green
R.F. Smith
S. Henikoff
S.A. Benner
T. Müller
T.P. Li
W.S.J. Valdar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2005
Field of study

We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

arXiv.org e-Print Archive

Crossref

Robust computation of linear models by convex relaxation

Author: Lerman Gilad
McCoy Michael
Tropp Joel A.
Zhang Teng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2014
Field of study

Consider a dataset of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called REAPER, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the REAPER problem, and it documents numerical experiments which confirm that REAPER can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when REAPER can approximate this subspace.Comment: Formerly titled "Robust computation of linear models, or How to find a needle in a haystack

arXiv.org e-Print Archive

Crossref

Caltech Authors

Regularized Wasserstein Means for Aligning Distributional Data

Author: Mi Liang
Wang Yalin
Zhang Wen
Publication venue
Publication date: 20/02/2020
Field of study

We propose to align distributional data from the perspective of Wasserstein means. We raise the problem of regularizing Wasserstein means and propose several terms tailored to tackle different problems. Our formulation is based on the variational transportation to distribute a sparse discrete measure into the target domain. The resulting sparse representation well captures the desired property of the domain while reducing the mapping cost. We demonstrate the scalability and robustness of our method with examples in domain adaptation, point set registration, and skeleton layout

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications