5,443 research outputs found
Geometric and Statistical Properties of the Mean-Field HP Model, the LS Model and Real Protein Sequences
Lattice models, for their coarse-grained nature, are best suited for the
study of the ``designability problem'', the phenomenon in which most of the
about 16,000 proteins of known structure have their native conformations
concentrated in a relatively small number of about 500 topological classes of
conformations. Here it is shown that on a lattice the most highly designable
simulated protein structures are those that have the largest number of
surface-core switchbacks. A combination of physical, mathematical and
biological reasons that causes the phenomenon is given. By comparing the most
foldable model peptides with protein sequences in the Protein Data Bank, it is
shown that whereas different models may yield similar designabilities,
predicted foldable peptides will simulate natural proteins only when the model
incorporates the correct physics and biology, in this case if the main folding
force arises from the differing hydrophobicity of the residues, but does not
originate, say, from the steric hindrance effect caused by the differing sizes
of the residues.Comment: 12 pages, 10 figure
Conditionals and modularity in general logics
In this work in progress, we discuss independence and interpolation and
related topics for classical, modal, and non-monotonic logics
A perceptual hash function to store and retrieve large scale DNA sequences
This paper proposes a novel approach for storing and retrieving massive DNA
sequences.. The method is based on a perceptual hash function, commonly used to
determine the similarity between digital images, that we adapted for DNA
sequences. Perceptual hash function presented here is based on a Discrete
Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray
level intensity pixel and the hash is calculated from its significant frequency
characteristics. This results to a drastic data reduction between the sequence
and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes
are not affected by "avalanche effect" and thus can be compared. The similarity
distance between two hashes is estimated with the Hamming Distance, which is
used to retrieve DNA sequences. Experiments that we conducted show that our
approach is relevant for storing massive DNA sequences, and retrieving them
Efficient Algorithms for the Closest Pair Problem and Applications
The closest pair problem (CPP) is one of the well studied and fundamental
problems in computing. Given a set of points in a metric space, the problem is
to identify the pair of closest points. Another closely related problem is the
fixed radius nearest neighbors problem (FRNNP). Given a set of points and a
radius , the problem is, for every input point , to identify all the
other input points that are within a distance of from . A naive
deterministic algorithm can solve these problems in quadratic time. CPP as well
as FRNNP play a vital role in computational biology, computational finance,
share market analysis, weather prediction, entomology, electro cardiograph,
N-body simulations, molecular simulations, etc. As a result, any improvements
made in solving CPP and FRNNP will have immediate implications for the solution
of numerous problems in these domains. We live in an era of big data and
processing these data take large amounts of time. Speeding up data processing
algorithms is thus much more essential now than ever before. In this paper we
present algorithms for CPP and FRNNP that improve (in theory and/or practice)
the best-known algorithms reported in the literature for CPP and FRNNP. These
algorithms also improve the best-known algorithms for related applications
including time series motif mining and the two locus problem in Genome Wide
Association Studies (GWAS)
Analyzing and Visualizing State Sequences in R with TraMineR
This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineRâÂÂs outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
Dynamical correlations in the escape strategy of Influenza A virus
The evolutionary dynamics of human Influenza A virus presents a challenging
theoretical problem. An extremely high mutation rate allows the virus to
escape, at each epidemic season, the host immune protection elicited by
previous infections. At the same time, at each given epidemic season a single
quasi-species, that is a set of closely related strains, is observed. A
non-trivial relation between the genetic (i.e., at the sequence level) and the
antigenic (i.e., related to the host immune response) distances can shed light
into this puzzle. In this paper we introduce a model in which, in accordance
with experimental observations, a simple interaction rule based on spatial
correlations among point mutations dynamically defines an immunity space in the
space of sequences. We investigate the static and dynamic structure of this
space and we discuss how it affects the dynamics of the virus-host interaction.
Interestingly we observe a staggered time structure in the virus evolution as
in the real Influenza evolutionary dynamics.Comment: 14 pages, 5 figures; main paper for the supplementary info in
arXiv:1303.595
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
- …