13,005 research outputs found
Pairwise alignment incorporating dipeptide covariation
Motivation: Standard algorithms for pairwise protein sequence alignment make
the simplifying assumption that amino acid substitutions at neighboring sites
are uncorrelated. This assumption allows implementation of fast algorithms for
pairwise sequence alignment, but it ignores information that could conceivably
increase the power of remote homolog detection. We examine the validity of this
assumption by constructing extended substitution matrixes that encapsulate the
observed correlations between neighboring sites, by developing an efficient and
rigorous algorithm for pairwise protein sequence alignment that incorporates
these local substitution correlations, and by assessing the ability of this
algorithm to detect remote homologies. Results: Our analysis indicates that
local correlations between substitutions are not strong on the average.
Furthermore, incorporating local substitution correlations into pairwise
alignment did not lead to a statistically significant improvement in remote
homology detection. Therefore, the standard assumption that individual residues
within protein sequences evolve independently of neighboring positions appears
to be an efficient and appropriate approximation
Laboratory Bounds on Electron Lorentz Violation
Violations of Lorentz boost symmetry in the electron and photon sectors can
be constrained by studying several different high-energy phenomenon. Although
they may not lead to the strongest bounds numerically, measurements made in
terrestrial laboratories produce the most reliable results. Laboratory bounds
can be based on observations of synchrotron radiation, as well as the observed
absences of vacuum Cerenkov radiation. Using measurements of synchrotron energy
losses at LEP and the survival of TeV photons, we place new bounds on the three
electron Lorentz violation coefficients c_(TJ), at the 3 x 10^(-13) to 6 x
10^(-15) levels.Comment: 18 page
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
Clustering with shallow trees
We propose a new method for hierarchical clustering based on the optimisation
of a cost function over trees of limited depth, and we derive a
message--passing method that allows to solve it efficiently. The method and
algorithm can be interpreted as a natural interpolation between two well-known
approaches, namely single linkage and the recently presented Affinity
Propagation. We analyze with this general scheme three biological/medical
structured datasets (human population based on genetic information, proteins
based on sequences and verbal autopsies) and show that the interpolation
technique provides new insight.Comment: 11 pages, 7 figure
Genetic Correlations in Mutation Processes
We study the role of phylogenetic trees on correlations in mutation
processes. Generally, correlations decay exponentially with the generation
number. We find that two distinct regimes of behavior exist. For mutation rates
smaller than a critical rate, the underlying tree morphology is almost
irrelevant, while mutation rates higher than this critical rate lead to strong
tree-dependent correlations. We show analytically that identical critical
behavior underlies all multiple point correlations. This behavior generally
characterizes branching processes undergoing mutation.Comment: revtex, 8 pages, 2 fig
Non-local on-shell field redefinition for the SME
This work instigates a study of non-local field mappings within the Lorentz-
and CPT-violating Standard-Model Extension (SME). An example of such a mapping
is constructed explicitly, and the conditions for the existence of its inverse
are investigated. It is demonstrated that the associated field redefinition can
remove b-type Lorentz violation from free SME fermions in certain situations.
These results are employed to obtain explicit expressions for the corresponding
Lorentz-breaking momentum-space eigenspinors and their orthogonality relations.Comment: 12 pages, REVTeX
Simplified amino acid alphabets based on deviation of conditional probability from random background
The primitive data for deducing the Miyazawa-Jernigan contact energy or
BLOSUM score matrix consists of pair frequency counts. Each amino acid
corresponds to a conditional probability distribution. Based on the deviation
of such conditional probability from random background, a scheme for reduction
of amino acid alphabet is proposed. It is observed that evident discrepancy
exists between reduced alphabets obtained from raw data of the
Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous
sequence database SCOP40 as a test set, we detect homology with the obtained
coarse-grained substitution matrices. It is verified that the reduced alphabets
obtained well preserve information contained in the original 20-letter
alphabet.Comment: 9 pages,3figure
Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment
For the Bernoulli Matching model of sequence alignment problem we apply the
Bethe ansatz technique via an exact mapping to the 5--vertex model on a square
lattice. Considering the terrace--like representation of the sequence alignment
problem, we reproduce by the Bethe ansatz the results for the averaged length
of the Longest Common Subsequence in Bernoulli approximation. In addition, we
compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
- …
