1,460 research outputs found

    Prediction of protein interactions on HIV-1-human PPI data using a novel closure-based integrated approach

    Get PDF
    Discovering Protein-Protein Interactions (PPI) is a new interesting challenge in computational biology. Identifying interactions among proteins was shown to be useful for finding new drugs and preventing several kinds of diseases. The identification of interactions between HIV-1 proteins and Human proteins is a particular PPI problem whose study might lead to the discovery of drugs and important interactions responsible for AIDS. We present the FIST algorithm for extracting hierarchical bi-clusters and minimal covers of association rules in one process. This algorithm is based on the frequent closed itemsets framework to efficiently generate a hierarchy of conceptual clusters and non-redundant sets of association rules with supporting object lists. Experiments conducted on a HIV-1 and Human proteins interaction dataset show that the approach efficiently identifies interactions previously predicted in the literature and can be used to predict new interactions based on previous biological knowledge

    Prediction of protein interactions on HIV-1-human PPI data using a novel closure-based integrated approach

    Get PDF
    Discovering Protein-Protein Interactions (PPI) is a new interesting challenge in computational biology. Identifying interactions among proteins was shown to be useful for finding new drugs and preventing several kinds of diseases. The identification of interactions between HIV-1 proteins and Human proteins is a particular PPI problem whose study might lead to the discovery of drugs and important interactions responsible for AIDS. We present the FIST algorithm for extracting hierarchical bi-clusters and minimal covers of association rules in one process. This algorithm is based on the frequent closed itemsets framework to efficiently generate a hierarchy of conceptual clusters and non-redundant sets of association rules with supporting object lists. Experiments conducted on a HIV-1 and Human proteins interaction dataset show that the approach efficiently identifies interactions previously predicted in the literature and can be used to predict new interactions based on previous biological knowledge

    Computational investigations of polymerase enzymes: Structure, function, inhibition, and biotechnology

    Get PDF
    AbstractDNA and RNA polymerases (Pols) are central to life, health, and biotechnology because they allow the flow of genetic information in biological systems. Importantly, Pol function and (de)regulation are linked to human diseases, notably cancer (DNA Pols) and viral infections (RNA Pols) such as COVID‐19. In addition, Pols are used in various applications such as synthesis of artificial genetic polymers and DNA amplification in molecular biology, medicine, and forensic analysis. Because of all of this, the field of Pols is an intense research area, in which computational studies contribute to elucidating experimentally inaccessible atomistic details of Pol function. In detail, Pols catalyze the replication, transcription, and repair of nucleic acids through the addition, via a nucleotidyl transfer reaction, of a nucleotide to the 3′‐end of the growing nucleic acid strand. Here, we analyze how computational methods, including force‐field‐based molecular dynamics, quantum mechanics/molecular mechanics, and free energy simulations, have advanced our understanding of Pols. We examine the complex interaction of chemical and physical events during Pol catalysis, like metal‐aided enzymatic reactions for nucleotide addition and large conformational rearrangements for substrate selection and binding. We also discuss the role of computational approaches in understanding the origin of Pol fidelity—the ability of Pols to incorporate the correct nucleotide that forms a Watson–Crick base pair with the base of the template nucleic acid strand. Finally, we explore how computations can accelerate the discovery of Pol‐targeting drugs and engineering of artificial Pols for synthetic and biotechnological applications.This article is categorized under: Structure and Mechanism > Reaction Mechanisms and Catalysis Structure and Mechanism > Computational Biochemistry and Biophysics Software > Molecular Modelin

    Semantic models as metrics for kernel-based interaction identification

    Get PDF
    Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge. Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection

    A binary interaction map between turnip mosaic virus and Arabidopsis thaliana proteomes

    Full text link
    [EN] Viruses are obligate intracellular parasites that have co-evolved with their hosts to establish an intricate network of protein-protein interactions. Here, we followed a high-throughput yeast two-hybrid screening to identify 378 novel protein-protein interactions between turnip mosaic virus (TuMV) and its natural host Arabidopsis thaliana. We identified the RNA-dependent RNA polymerase NIb as the viral protein with the largest number of contacts, including key salicylic acid-dependent transcription regulators. We verified a subset of 25 interactions in planta by bimolecular fluorescence complementation assays. We then constructed and analyzed a network comprising 399 TuMV-A. thaliana interactions together with intravirus and intrahost connections. In particular, we found that the host proteins targeted by TuMV are enriched in different aspects of plant responses to infections, are more connected and have an increased capacity to spread information throughout the cell proteome, display higher expression levels, and have been subject to stronger purifying selection than expected by chance. The proviral or antiviral role of ten host proteins was validated by characterizing the infection dynamics in the corresponding mutant plants, supporting a proviral role for the transcriptional regulator TGA1. Comparison with similar studies with animal viruses, highlights shared fundamental features in their mode of action.We thank Francisca de la Iglesia and Paula Agudo for excellent technical assistance and the rest of the EvolSysVir lab for fruitful discussions. This work was supported by grants PID2019-103998GB-I00 and PGC2018-101410-B-I00 (Agencia Estatal de Investigacion - FEDER) to S.F.E. and G.R., respectively, and PROMETEO/2019/012 (Generalitat Valenciana) to S.F.E.Martínez, F.; Carrasco, JL.; Toft, C.; Hillung, J.; Giménez-Santamarina, S.; Yenush, L.; Rodrigo Tarrega, G.... (2023). A binary interaction map between turnip mosaic virus and Arabidopsis thaliana proteomes. Communications Biology. 6(1):1-18. https://doi.org/10.1038/s42003-023-04427-81186

    Kui bioloog kohtab keemikut: HIV-1 inhibiitorite otsingul

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.HIV-1 on pandeemiline viirus ning nakatatud inimeste arv maailmas suureneb pidevalt. Vaktsiin selle vastu puudub, kuid nakatatud patsientide raviks on kliinilise kasutuse loa saanud ligi 30 erinevat ühendit. Erinevatesse klassidesse kuuluvate inhibiitorite kombineeritud kasutamine (ART) on kujunenud HIV-ga nakatunud patsientide ravimise aluseks. ART ravi kõige olulisemaks puuduseks on ravimite toksilised kõrvaltoimed ning uute resistentsete viiruse tüvede teke. Antud väitekirja põhieesmärgiks oli leida efektiivseid mittetoksilisi ühendeid, mis suruvad maha HIV-1 replikatsiooni (nii metsik-tüüpi viiruse kui ka resistentsete tüvede oma) keskendudes eelkõige pöördtranskriptaasi inhibiitoritele. Kolm gruppi keemilisi ühendeid testiti antiretroviirus-vastase toime suhtes: atsüklilised tümidiini nukleosiidi analoogid, bimorfoliinid ja nende derivaadid ning sahariidhüdrasoonid. Samuti tõestati eksperimentaalselt eelneva in silico skriiningu käigus saadud tulemusi. Käesoleva töö tulemusena töötati välja meetod HIV-1 inhibiitorite skriinimiseks ViraPower lentiviiruse ekspressioonisüsteemi (Invitrogen) põhjal. Meetod on kiire, lihtne, usaldusväärne ning on kõlblik kasutamiseks väljaspool kõrgenenud ohutustasemega laborit (BSL3). Kõige tugevam toime meie laboris uuritud ühenditest oli mitte-nukleosiidsel pöördtranskriptaasi inhibiitoril (NNRTI), mis omas tuntud ravimi Nevirapiiniga sarnast aktiivsust, kuid oli kahjuks mitteaktiivne viiruse resistentsete vormide vastu. Antud ühend leiti ratsionaalse ravimidisaini strateegiat kasutades. Antud töö tulemused võimaldasid täiustada olemasolevat in silico skriiningu meetodit, mis võimaldab tulevikus kavandada uusi ning efektiivsemaid HIV-1 inhibiitoreid.HIV-1 is a pandemic virus and the numbers of infected people are constantly increasing all over the world. There is no vaccine available, but about 30 compounds have been approved by FDA for the treatment of HIV-infected patients. ART (antiretroviral therapy) consists of a combination of at least 3 inhibitors with different mechanism of action. The main drawback of such treatment is side effects of the drugs and the appearance of new resistant forms of the virus. The aim of this work was to find non-toxic compounds which can effectively suppress HIV-1 replication (both wild-type and resistant forms of the virus). We focused our efforts mostly on the discovery of novel reverse transcriptase inhibitors. Three groups of compounds were tested for their antiretroviral activity (acyclic thymine nucleoside analogues, bimorpholines and their derivatives, and saccharide hydrazones); also we experimentally verified the results of the previous in silico screening. As a result of this work, we developed an assay for HIV-1 inhibitors′ screening. This assay is based on ViraPower Lentiviral Expression System (Invitrogen) and is simple, fast, reliable and can be used outside BSL3 facilities. The most potent compound, found by us so far, acts as a non-nucleoside reverse transcriptase inhibitor (NNRTI) and has activity comparable to the activity of a known NNRTI nevirapine, but is unfortunately inactive against the resistant forms of the virus. This compound was found using ration drug design strategy. Most importantly, the results of this work allowed us to improve the existing in silico screening method, which could result in more potent HIV-1 inhibitors in the future

    Computational techniques for cell signaling

    Get PDF
    Cells can be viewed as sophisticated machines that organize their constituent components and molecules to receive, process, and respond to signals. The goal of the scientist is to uncover both the individual operations underlying these processes and the mechanism of the emergent properties of interest that give rise to the various phenomena such as disease, development, recovery or aging. Cell signaling plays a crucial role in all of these areas. The complexity of biological processes coupled with the physical limitations of experiments to observe individual molecular components across small to large scales limits the knowlege that can be gleaned from direct observations. Mathematical modeling can be used to estimate parameters that are hidden or too difficult to observe in experiments, and it can make qualitative predictions that can distinguish between hypotheses of interest. Statistical analysis can be employed to explore the large amounts of data generated by modern experimental techniques such as sequencing and high-throughput screening, and it can integrate the observations from many individual experiments or even separate studies to generate new hypotheses. This dissertation employs mathematical and statistical analyses for three prominent aspects of cell signaling: the physical transfer of signaling molecules between cells, the intracellular protein machinery that organizes into pathways to process these signals, and changes in gene expression in response to cell signaling. Computational biology can be described as an applied discipline in that it aims to further the knowledge of a discipline that is distinct from itself. However, the richness of the problems encountered in biology requires continuous development of better methods equipped to handle the complexity, size, or uncertainty of the data, and to build in constraints motivated by the reality of the underlying biological system. In addition, better computational and mathematical methods are also needed to model the emergent behavior that arises from many components. The work presented in this dissertation fulfills both of these roles. We apply known and existing techniques to analyse experimental data and provide biological meaning, and we also develop new statistical and mathematical models that add to the knowledge and practice of computational biology. Much of cell signaling is initiated by signal transduction from the exterior, either by sensing the environmental conditions or the recpetion of specific signals from other cells. The phenomena of most immediate concern to our species, that of human health and disease, are usually also generated from, and manifest in, our tissues and organs due to the interaction and signaling between cells. A modality of inter-cellular communication that was regarded earlier as an obscure phenomenon but has more recently come to the attention of the scientific community is that of tunneling nanotubes (TNs). TNs have been observed as thin (of the order of 100 nanometers) extensions from a cell to another closely located one. The formation of such structures along with the intercellular exchange of molecules through them, and their interaction with the cytoskeleton, could be involved in many important processes, such as tissue formation and cancer growth. We describe a simple model of passive transport of molecules between cells due to TNs. Building on a few basic assumptions, we derive parametrized, closed-form expressions to describe the concentration of transported molecules as a function of distance from a population of TN-forming cells. Our model predicts how the perfusion of molecules through the TNs is affected by the size of the transferred molecules, the length and stability of nanotube formation, and the differences between membrane-bound and cytosolic proteins. To our knowledge, this is the first published mathematical model of intercellular transfer through tunneling nanotubes. We envision that experimental observations will be able to confirm or improve the assumptions made in our model. Furthermore, quantifying the form of inter-cellular communication in the basic scenario envisioned in our model can help suggest ways to measure and investigate cases of possible regulation of either formation of tunneling nanotubes or transport through them. The next problem we focus on is uncovering how the interactions between the genes and proteins in a cell organize into pathways to process call signals or perform other tasks. The ability to accurately model and deeply understand gene and protein interaction networks of various kinds can be very powerful for prioritizing candidate genes and predicting their role in various signaling pathways and processes. A popular technique for gene prioritization and function prediction is the graph diffusion kernel. We show how the graph diffusion kernel is mathematically similar to the Ising spin graph, a model popular in statistical physics but not usually employed on biological interaction networks. We develop a new method for calculating gene association based on the Ising spin model which is different from the methods common in either bioinformatics or statistical physics. We show that our method performs better than both the graph diffusion kernel and its commonly used equivalent in the Ising model. We present a theoretical argument for understanding its performance based on ideas of phase transitions on networks. We measure its performance by applying our method to link prediction on protein interaction networks. Unlike candidate gene prioritization or function prediction, link prediction does not depend on the existing annotation or characterization of genes for ground truth. It helps us to avoid the confounding noise and uncertainty in the network and annotation data. As a purely network analysis problem, it is well suited for comparing network analysis methods. Once we know that we are accurately modeling the interaction network, we can employ our model to solve other problems like gene prioritization using interaction data. We also apply statistical analysis for a specific instance of a cell signaling process: the drought response in Brassica napus, a plant of scientific and economic importance. Important changes in the cell physiology of guard cells are initiated by abscisic acid, an important phytohormone that signals water deficit stress. We analyse RNA-seq reads resulting from the sequencing of mRNA extracted from protoplasts treated with abscisic acid. We employ sequence analysis, statisitical modeling, and the integration of cross-species network data to uncover genes, pathways, and interactions important in this process. We confirm what is known from other species and generate new gene and interaction candidates. By associating functional and sequence modification, we are also able to uncover evidence of evolution of gene specialization, a process that is likely widespread in polyploid genomes. This work has developed new computational methods and applied existing tools for understanding cellular signaling and pathways. We have applied statistical analysis to integrate expression, interactome, pathway, regulatory elements, and homology data to infer \textit{Brassica napus} genes and their roles involved in drought response. Previous literature suggesting support for our findings from other species based on independent experiments is found for many of of these findings. By relating the changes in regulatory elements, our RNA-seq results and common gene ancestry, we present evidence of its evolution in the context of polyploidy. Our work can provide a scientific basis for the pursuit of certain genes as targets of breeding and genetic engineering efforts for the development of drought tolerant oil crops. Building on ideas from statistical physics, we developed a new model of gene associations in networks. Using link prediction as a metric for the accuracy of modeling the underlying structure of a real network, we show that our model shows improved performance on real protein interaction networks. Our model of gene associations can be use to prioritize candidate genes for a disease or phenotype of interest. We also develop a mathematical model for a novel inter-cellular mode of biomolecule transfer. We relate hypotheses about the dynamics of TN formation, stability, and nature of molecular transport to quantitative predictions that may be tested by suitable experiments. In summary, this work demostrates the application and development of computational analysis of cell signaling at the level of the transcriptome, the interactome, and physical transport

    Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference

    Get PDF
    A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works
    corecore