23,096 research outputs found

    A multi-species functional embedding integrating sequence and network structure

    Full text link
    A key challenge to transferring knowledge between species is that different species have fundamentally different genetic architectures. Initial computational approaches to transfer knowledge across species have relied on measures of heredity such as genetic homology, but these approaches suffer from limitations. First, only a small subset of genes have homologs, limiting the amount of knowledge that can be transferred, and second, genes change or repurpose functions, complicating the transfer of knowledge. Many approaches address this problem by expanding the notion of homology by leveraging high-throughput genomic and proteomic measurements, such as through network alignment. In this work, we take a new approach to transferring knowledge across species by expanding the notion of homology through explicit measures of functional similarity between proteins in different species. Specifically, our kernel-based method, HANDL (Homology Assessment across Networks using Diffusion and Landmarks), integrates sequence and network structure to create a functional embedding in which proteins from different species are embedded in the same vector space. We show that inner products in this space and the vectors themselves capture functional similarity across species, and are useful for a variety of functional tasks. We perform the first whole-genome method for predicting phenologs, generating many that were previously identified, but also predicting new phenologs supported from the biological literature. We also demonstrate the HANDL embedding captures pairwise gene function, in that gene pairs with synthetic lethal interactions are significantly separated in HANDL space, and the direction of separation is conserved across species. Software for the HANDL algorithm is available at http://bit.ly/lrgr-handl.Published versio

    From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions

    Get PDF
    ©2009 Gao, Skolnick. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.doi:10.1371/journal.pcbi.1000341DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Ca deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein

    Benchmarking network propagation methods for disease gene identification

    Get PDF
    In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genesPeer ReviewedPostprint (published version

    DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

    Full text link
    Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Accurately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from \c{hi}1 to \c{hi}4 and training diffusion models for each torsional angle. We evaluate the method on several benchmarks for protein side-chain packing and show that our method achieves improvements of 11.9% and 13.5% in angle accuracy on CASP13 and CASP14, respectively, with a significantly smaller model size (60x fewer parameters). Additionally, we show the effectiveness of our method in enhancing side-chain predictions in the AlphaFold2 model. Code will be available upon the accept.Comment: Under revie

    A Review of Mathematical Models for the Formation of\ud Vascular Networks

    Get PDF
    Mainly two mechanisms are involved in the formation of blood vasculature: vasculogenesis and angiogenesis. The former consists of the formation of a capillary-like network from either a dispersed or a monolayered population of endothelial cells, reproducible also in vitro by specific experimental assays. The latter consists of the sprouting of new vessels from an existing capillary or post-capillary venule. Similar phenomena are also involved in the formation of the lymphatic system through a process generally called lymphangiogenesis.\ud \ud A number of mathematical approaches have analysed these phenomena. This paper reviews the different modelling procedures, with a special emphasis on their ability to reproduce the biological system and to predict measured quantities which describe the overall processes. A comparison between the different methods is also made, highlighting their specific features

    ÉlƑlĂ©nyek kollektĂ­v viselkedĂ©sĂ©nek statisztikus fizikĂĄja = Statistical physics of the collective behaviour of organisms

    Get PDF
    Experiments: We have carried out quantitative experiments on the collective motion of cells as a function of their density. A sharp transition could be observed from the random motility in sparse cultures to the flocking of dense islands of cells. Using ultra light GPS devices developed by us, we have determined the existing hierarchical relations within a flock of 10 homing pigeons. Modelling: From the simulations of our new model of flocking we concluded that the information exchange between particles was maximal at the critical point, in which the interplay of such factors as the level of noise, the tendency to follow the direction and the acceleration of others results in large fluctuations. Analysis: We have proposed a novel link-density based approach to finding overlapping communities in large networks. The algorithm used for the implementation of this technique is very efficient for most real networks, and provides full statistics quickly. Correspondingly, we have developed a by now popular, user-friendly, freely downloadable software for finding overlapping communities. Extending our method to the time-dependent regime, we found that large groups in evolving networks persist for longer if they are capable of dynamically altering their membership, thus, an ability to change the group composition results in better adaptability. We also showed that knowledge of the time commitment of members to a given community can be used for estimating the community's lifetime. Experiments: We have carried out quantitative experiments on the collective motion of cells as a function of their density. A sharp transition could be observed from the random motility in sparse cultures to the flocking of dense islands of cells. Using ultra light GPS devices developed by us, we have determined the existing hierarchical relations within a flock of 10 homing pigeons. Modelling: From the simulations of our new model of flocking we concluded that the information exchange between particles was maximal at the critical point, in which the interplay of such factors as the level of noise, the tendency to follow the direction and the acceleration of others results in large fluctuations. Analysis: We have proposed a novel link-density based approach to finding overlapping communities in large networks. The algorithm used for the implementation of this technique is very efficient for most real networks, and provides full statistics quickly. Correspondingly, we have developed a by now popular, user-friendly, freely downloadable software for finding overlapping communities. Extending our method to the time-dependent regime, we found that large groups in evolving networks persist for longer if they are capable of dynamically altering their membership, thus, an ability to change the group composition results in better adaptability. We also showed that knowledge of the time commitment of members to a given community can be used for estimating the community's lifetime

    Synthetic Biology: A Bridge between Artificial and Natural Cells.

    Get PDF
    Artificial cells are simple cell-like entities that possess certain properties of natural cells. In general, artificial cells are constructed using three parts: (1) biological membranes that serve as protective barriers, while allowing communication between the cells and the environment; (2) transcription and translation machinery that synthesize proteins based on genetic sequences; and (3) genetic modules that control the dynamics of the whole cell. Artificial cells are minimal and well-defined systems that can be more easily engineered and controlled when compared to natural cells. Artificial cells can be used as biomimetic systems to study and understand natural dynamics of cells with minimal interference from cellular complexity. However, there remain significant gaps between artificial and natural cells. How much information can we encode into artificial cells? What is the minimal number of factors that are necessary to achieve robust functioning of artificial cells? Can artificial cells communicate with their environments efficiently? Can artificial cells replicate, divide or even evolve? Here, we review synthetic biological methods that could shrink the gaps between artificial and natural cells. The closure of these gaps will lead to advancement in synthetic biology, cellular biology and biomedical applications
    • 

    corecore