2,492 research outputs found

    A comprehensive integrated drug similarity resource for in-silico drug repositioning and beyond.

    Full text link
    Drug similarity studies are driven by the hypothesis that similar drugs should display similar therapeutic actions and thus can potentially treat a similar constellation of diseases. Drug-drug similarity has been derived by variety of direct and indirect sources of evidence and frequently shown high predictive power in discovering validated repositioning candidates as well as other in-silico drug development applications. Yet, existing resources either have limited coverage or rely on an individual source of evidence, overlooking the wealth and diversity of drug-related data sources. Hence, there has been an unmet need for a comprehensive resource integrating diverse drug-related information to derive multi-evidenced drug-drug similarities. We addressed this resource gap by compiling heterogenous information for an exhaustive set of small-molecule drugs (total of 10 367 in the current version) and systematically integrated multiple sources of evidence to derive a multi-modal drug-drug similarity network. The resulting database, 'DrugSimDB' currently includes 238 635 drug pairs with significant aggregated similarity, complemented with an interactive user-friendly web interface (http://vafaeelab.com/drugSimDB.html), which not only enables database ease of access, search, filtration and export, but also provides a variety of complementary information on queried drugs and interactions. The integration approach can flexibly incorporate further drug information into the similarity network, providing an easily extendable platform. The database compilation and construction source-code has been well-documented and semi-automated for any-time upgrade to account for new drugs and up-to-date drug information

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference

    Get PDF
    A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works

    Deep Learning for Embedding and Integrating Multimodal Biomedical Data

    Get PDF
    Biomedical data is being generated in extremely high throughput and high dimension by technologies in areas ranging from single-cell genomics, proteomics, and transcriptomics (cytometry, single-cell RNA and ATAC sequencing) to neuroscience and cognition (fMRI and PET) to pharmaceuticals (drug perturbations and interactions). These new and emerging technologies and the datasets they create give an unprecedented view into the workings of their respective biological entities. However, there is a large gap between the information contained in these datasets and the insights that current machine learning methods can extract from them. This is especially the case when multiple technologies can measure the same underlying biological entity or system. By separately analyzing the same system but from different views gathered by different data modalities, patterns are left unobserved if they only emerge from the multi-dimensional joint representation of all of the modalities together. Through an interdisciplinary approach that emphasizes active collaboration with data domain experts, my research has developed models for data integration, extracting important insights through the joint analysis of varied data sources. In this thesis, I discuss models that address this task of multi-modal data integration, especially generative adversarial networks (GANs) and autoencoders (AEs). My research has been focused on using both of these models in a generative way for concrete problems in cutting-edge scientific applications rather than the exclusive focus on the generation of high-resolution natural images. The research in this thesis is united around ideas of building models that can extract new knowledge from scientific data inaccessible to currently existing methods

    Collective Multi-relational Network Mining

    Get PDF
    Our world is becoming increasingly interconnected, and the study of networks and graphs are becoming more important than ever. Domains such as biological and pharmaceutical networks, online social networks, the World Wide Web, recommender systems, and scholarly networks are just a few examples that include explicit or implicit network structures. Most networks are formed between different types of nodes and contain different types of links. Leveraging these multi-relational and heterogeneous structures is an important factor in developing better models for these real-world networks. Another important aspect of developing models for network data to make predictions about entities such as nodes or links, is the connections between such entities. These connections invalidate the i.i.d. assumptions about the data in most traditional machine learning methods. Hence, unlike models for non-network data where predictions about entities are made independently of each other, the inter-connectivity of the entities in networks should cause the inferred information about one entity to change the models belief about other related entities. In this dissertation, I present models that can effectively leverage the multi-relational nature of networks and collectively make predictions on links and nodes. In both tasks, I empirically show the importance of considering the multi-relational characteristics and collective predictions. In the first part, I present models to make predictions on nodes by leveraging the graph structure, links generation sequence, and making collective predictions. I apply the node classification methods to detect social spammers in evolving multi-relational social networks and show their effectiveness in identifying spammers without the need of using the textual content. In the second part, I present a generalized augmented multi-relational bi-typed network. I then propose a template for link inference models on these networks and show their application in pharmaceutical discoveries and recommender systems. In the third part, I show that my proposed collective link prediction model is an instance of a general graph-based prediction model that relies on a neighborhood graph for predictions. I then propose a framework that can dynamically adapt the neighborhood graph based on the state of variables from intermediate inference results, as well as structural properties of the relations connecting them to improve the predictive performance of the model

    Descoberta de conhecimento biomédico através de representações continuas de grafos multi-relacionais

    Get PDF
    Knowledge graphs are multi-relational graph structures that allow to organize data in a way that is not only query able but that also allows the inference of implicit knowledge by both humans and, particularly, machines. In recent years new methods have been developed in order to maximize the knowledge that can be extracted from these structures, especially in the machine learning field. Knowledge graph embedding (KGE) strategies allow to map the data of these graphs to a lower dimensional space to facilitate the application of downstream tasks such as link prediction or node classification. In this work the capabilities and limitations of using these techniques to derive new knowledge from pre-existing biomedical networks was explored, since this is a field that not only has seen efforts towards converting its large knowledge bases into knowledge graphs, but that also can make use of the predictive capabilities of these models in order to accelerate research in the field. In order to do so, several KGE models were studied and a pipeline was created in order to obtain and train such models on different biomedical datasets. The results show that these models can make accurate predictions on some datasets, but that their performance can be hampered by some inherent characteristics of the networks. Additionally, with the knowledge acquired during this research a notebook was created that aims to be an entry point to other researchers interested in exploring this field.Grafos de conhecimento são grafos multi-relacionais que permitem organizar informação de maneira a que esta seja não apenas passível de ser inquirida, mas que também permita a inferência logica de nova informação por parte de humanos e especialmente sistemas computacionais. Recentemente vários métodos têm vindo a ser criados de maneira a maximizar a informação que pode ser retirada destas estruturas, sendo a área de \Machine Learning" um dos grandes propulsores para tal. \Knowledge graph embeddings" (KGE) permitem que os componentes destes grafos sejam mapeados num espaço latente, de maneira a facilitar a aplicação de tarefas como a predição de novas ligações no grafo ou classificação de nós. Neste trabalho foram exploradas as capacidades e limitações da aplicação de modelos baseados em \Knowledge graph embeddings" a redes biomédicas existentes, dado que a biomedicina é uma área na qual têm sido feitos esforços no sentido de organizar a sua vasta base de conhecimento em grafos de conhecimento, e onde esta capacidade de predição pode ser usada para potenciar avanços nos seus diversos domínios. Para tal, no presente trabalho, vários modelos foram estudados e uma pipeline foi criada para treinar os mesmos sobre algumas redes biomédicas. Os resultados mostram que estes modelos conseguem de facto ser precisos no que diz respeito á tarefa de predição de ligações em alguns conjuntos de dados, contudo esta precisão aparenta ser afetada por características inerentes à estrutura do grafo. Adicionalmente, com o conhecimento adquirido durante a realização deste trabalho foi criado um \notebook" que tem como objetivo servir como uma introdução à área de \Knowledge graph embeddings" para investigadores interessados em explorar a mesma.Mestrado em Engenharia de Computadores e Telemátic
    corecore