5,443 research outputs found

    Geometric and Statistical Properties of the Mean-Field HP Model, the LS Model and Real Protein Sequences

    Get PDF
    Lattice models, for their coarse-grained nature, are best suited for the study of the ``designability problem'', the phenomenon in which most of the about 16,000 proteins of known structure have their native conformations concentrated in a relatively small number of about 500 topological classes of conformations. Here it is shown that on a lattice the most highly designable simulated protein structures are those that have the largest number of surface-core switchbacks. A combination of physical, mathematical and biological reasons that causes the phenomenon is given. By comparing the most foldable model peptides with protein sequences in the Protein Data Bank, it is shown that whereas different models may yield similar designabilities, predicted foldable peptides will simulate natural proteins only when the model incorporates the correct physics and biology, in this case if the main folding force arises from the differing hydrophobicity of the residues, but does not originate, say, from the steric hindrance effect caused by the differing sizes of the residues.Comment: 12 pages, 10 figure

    Conditionals and modularity in general logics

    Full text link
    In this work in progress, we discuss independence and interpolation and related topics for classical, modal, and non-monotonic logics

    A perceptual hash function to store and retrieve large scale DNA sequences

    Full text link
    This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by "avalanche effect" and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them

    Efficient Algorithms for the Closest Pair Problem and Applications

    Full text link
    The closest pair problem (CPP) is one of the well studied and fundamental problems in computing. Given a set of points in a metric space, the problem is to identify the pair of closest points. Another closely related problem is the fixed radius nearest neighbors problem (FRNNP). Given a set of points and a radius RR, the problem is, for every input point pp, to identify all the other input points that are within a distance of RR from pp. A naive deterministic algorithm can solve these problems in quadratic time. CPP as well as FRNNP play a vital role in computational biology, computational finance, share market analysis, weather prediction, entomology, electro cardiograph, N-body simulations, molecular simulations, etc. As a result, any improvements made in solving CPP and FRNNP will have immediate implications for the solution of numerous problems in these domains. We live in an era of big data and processing these data take large amounts of time. Speeding up data processing algorithms is thus much more essential now than ever before. In this paper we present algorithms for CPP and FRNNP that improve (in theory and/or practice) the best-known algorithms reported in the literature for CPP and FRNNP. These algorithms also improve the best-known algorithms for related applications including time series motif mining and the two locus problem in Genome Wide Association Studies (GWAS)

    Analyzing and Visualizing State Sequences in R with TraMineR

    Get PDF
    This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineRâÂÂs outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

    Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

    Full text link
    Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

    Dynamical correlations in the escape strategy of Influenza A virus

    Full text link
    The evolutionary dynamics of human Influenza A virus presents a challenging theoretical problem. An extremely high mutation rate allows the virus to escape, at each epidemic season, the host immune protection elicited by previous infections. At the same time, at each given epidemic season a single quasi-species, that is a set of closely related strains, is observed. A non-trivial relation between the genetic (i.e., at the sequence level) and the antigenic (i.e., related to the host immune response) distances can shed light into this puzzle. In this paper we introduce a model in which, in accordance with experimental observations, a simple interaction rule based on spatial correlations among point mutations dynamically defines an immunity space in the space of sequences. We investigate the static and dynamic structure of this space and we discuss how it affects the dynamics of the virus-host interaction. Interestingly we observe a staggered time structure in the virus evolution as in the real Influenza evolutionary dynamics.Comment: 14 pages, 5 figures; main paper for the supplementary info in arXiv:1303.595

    Inference of Ancestral Recombination Graphs through Topological Data Analysis

    Get PDF
    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe
    • …
    corecore