15,563 research outputs found

    Progress in the development and application of computational methods for probabilistic protein design

    Get PDF
    Proteins exhibit a wide range of physical and chemical properties, including highly selective molecular recognition and catalysis, and are also key components in biological metabolic, catabolic, and signaling pathways. Given that proteins are well-structured and can now be rapidly synthesized, they are excellent targets for engineering of both molecular structure and biological function. Computational analysis of the protein design problem allows scientists to explore sequence space and systematically discover novel protein molecules. Nonetheless, the complexity of proteins, the subtlety of the determinants of folding, and the exponentially large number of possible sequences impede the search for peptide sequences compatible with a desired structure and function. Directed search algorithms, which identify directly a small number of sequences, have achieved some success in identifying sequences with desired structures and functions. Alternatively, one can adopt a probabilistic approach. Instead of a finite number of sequences, such calculations result in a probabilistic description of the sequence ensemble. In particular, by casting the formalism in the language of statistical mechanics, the site-specific amino acid probabilities of sequences compatible with a target structure may be readily identified. The computational probabilities are well suited for both de novo protein design of particular sequences as well as combinatorial, library-based protein engineering. The computed site-specific amino acid profile may be converted to a nucleotide base distribution to allow assembly of a partially randomized gene library. The ability to synthesize readily such degenerate oligonucleotide sequences according to the prescribed distribution is key to constructing a biased peptide library genuinely reflective of the computational design. Herein we illustrate how a standard DNA synthesizer can be used with only a slight modification to the synthesis protocol to generate a pool of degenerate DNA sequences, which encodes a predetermined amino acid distribution with high fidelity

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Link Prediction in Complex Networks: A Survey

    Full text link
    Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labelled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.Comment: 44 pages, 5 figure

    Design and Development of Software Tools for Bio-PEPA

    Get PDF
    This paper surveys the design of software tools for the Bio-PEPA process algebra. Bio-PEPA is a high-level language for modelling biological systems such as metabolic pathways and other biochemical reaction networks. Through providing tools for this modelling language we hope to allow easier use of a range of simulators and model-checkers thereby freeing the modeller from the responsibility of developing a custom simulator for the problem of interest. Further, by providing mappings to a range of different analysis tools the Bio-PEPA language allows modellers to compare analysis results which have been computed using independent numerical analysers, which enhances the reliability and robustness of the results computed.

    ADAM: Analysis of Discrete Models of Biological Systems Using Computer Algebra

    Get PDF
    Background: Many biological systems are modeled qualitatively with discrete models, such as probabilistic Boolean networks, logical models, Petri nets, and agent-based models, with the goal to gain a better understanding of the system. The computational complexity to analyze the complete dynamics of these models grows exponentially in the number of variables, which impedes working with complex models. Although there exist sophisticated algorithms to determine the dynamics of discrete models, their implementations usually require labor-intensive formatting of the model formulation, and they are oftentimes not accessible to users without programming skills. Efficient analysis methods are needed that are accessible to modelers and easy to use. Method: By converting discrete models into algebraic models, tools from computational algebra can be used to analyze their dynamics. Specifically, we propose a method to identify attractors of a discrete model that is equivalent to solving a system of polynomial equations, a long-studied problem in computer algebra. Results: A method for efficiently identifying attractors, and the web-based tool Analysis of Dynamic Algebraic Models (ADAM), which provides this and other analysis methods for discrete models. ADAM converts several discrete model types automatically into polynomial dynamical systems and analyzes their dynamics using tools from computer algebra. Based on extensive experimentation with both discrete models arising in systems biology and randomly generated networks, we found that the algebraic algorithms presented in this manuscript are fast for systems with the structure maintained by most biological systems, namely sparseness, i.e., while the number of nodes in a biological network may be quite large, each node is affected only by a small number of other nodes, and robustness, i.e., small number of attractors

    Inherent limitations of probabilistic models for protein-DNA binding specificity

    Get PDF
    The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible
    • …
    corecore