6 research outputs found

    ProphNet: A generic prioritization method through propagation of information

    Get PDF
    This article has been published as part of BMC Bioinformatics Volume 15 Supplement 1, 2014: Integrated Bio-Search: Selected Works from the 12th International Workshop on Network Tools and Applications in Biology (NETTAB 2012).[Background] Prioritization methods have become an useful tool for mining large amounts of data to suggest promising hypotheses in early research stages. Particularly, network-based prioritization tools use a network representation for the interactions between different biological entities to identify novel indirect relationships. However, current network-based prioritization tools are strongly tailored to specific domains of interest (e.g. gene-disease prioritization) and they do not allow to consider networks with more than two types of entities (e.g. genes and diseases). Therefore, the direct application of these methods to accomplish new prioritization tasks is limited.[Results] This work presents ProphNet, a generic network-based prioritization tool that allows to integrate an arbitrary number of interrelated biological entities to accomplish any prioritization task. We tested the performance of ProphNet in comparison with leading network-based prioritization methods, namely rcNet and DomainRBF, for gene-disease and domain-disease prioritization, respectively. The results obtained by ProphNet show a significant improvement in terms of sensitivity and specificity for both tasks. We also applied ProphNet to disease-gene prioritization on Alzheimer, Diabetes Mellitus Type 2 and Breast Cancer to validate the results and identify putative candidate genes involved in these diseases.[Conclusions] ProphNet works on top of any heterogeneous network by integrating information of different types of biological entities to rank entities of a specific type according to their degree of relationship with a query set of entities of another type. Our method works by propagating information across data networks and measuring the correlation between the propagated values for a query and a target sets of entities. ProphNet is available at: http://genome2.ugr.es/prophnet webcite. A Matlab implementation of the algorithm is also available at the website.This work was part of projects P08-TIC-4299 of J. A., Sevilla and TIN2009-13489 of DGICT, Madrid. It was also supported by Plan Propio de Investigación, University of Granada

    Novel therapeutics for complex diseases from genome-wide association data

    Full text link
    The development of novel therapies is essential to lower the burden of complex diseases. The purpose of this study is to identify novel therapeutics for complex diseases using bioinformatic methods. Bioinformatic tools such as candidate gene prediction tools allow identification of disease genes by identifying the potential candidate genes linked to genetic markers of the disease. Candidate gene prediction tools can only identify candidates for further research, and do not identify disease genes directly. Integration of drug-target datasets with candidate gene data-sets can identify novel potential therapeutics suitable for repositioning in clinical trials. Drug repositioning can save valuable time and money spent in therapeutic development of complex diseases

    Data-driven knowledge discovery in polycystic kidney disease

    Get PDF
    The use of data derived from genomics and transcriptomic to further develop our understanding of Polycystic Kidney Diseases and identify novel drugs for its treatment.LUMC / Geneeskund

    Investigation of vertex centralities in human gene-disease networks

    Get PDF
    Studying associations among genes and diseases provides an important avenue for a better understanding of genetic-related disorders, phenotypes and other complex diseases. Research has shown that many complex human diseases cannot be attributed to a particular gene, but a set of interacting genes. The effect of a specific gene on multiple diseases is called pleiotropy and interactions among several genes to contribute to a specific disease is called epistasis. In addition, many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases helps us better understand the common genetic background of diseases and develop new drugs that can treat them more effectively and avoid side effects. Meanwhile, network science has seen an increase in applications to model complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases as well as interactions among associated genes. In this thesis, known disease-gene associations are represented using a weighted bipartite network. Subsequently, two new networks are extracted. One is the weighted human disease network to show the correlations of diseases, and the other is the weighted gene network to capture the interactions among genes. We propose two new centrality measures for the weighted human disease network and the weighted gene network. We evaluate our centrality measurements and compare them with the most commonly used centralities in biological networks including degree, closeness, and betweenness. The results show that our new centrality methods can find more important vertices since the removal of the top-ranked vertices leads to a higher decline rate of the network efficiency. Our identified key diseases and genes hold the potential of helping better understand the genetic background and etiologies of complex human diseases

    Learning Logical Rules from Knowledge Graphs

    Get PDF
    Ph.D. (Integrated) ThesisExpressing and extracting regularities in multi-relational data, where data points are interrelated and heterogeneous, requires well-designed knowledge representation. Knowledge Graphs (KGs), as a graph-based representation of multi-relational data, have seen a rapidly growing presence in industry and academia, where many real-world applications and academic research are either enabled or augmented through the incorporation of KGs. However, due to the way KGs are constructed, they are inherently noisy and incomplete. In this thesis, we focus on developing logic-based graph reasoning systems that utilize logical rules to infer missing facts for the completion of KGs. Unlike most rule learners that primarily mine abstract rules that contain no constants, we are particularly interested in learning instantiated rules that contain constants due to their ability to represent meaningful patterns and correlations that can not be expressed by abstract rules. The inclusion of instantiated rules often leads to exponential growth in the search space. Therefore, it is necessary to develop optimization strategies to balance between scalability and expressivity. To such an end, we propose GPFL, a probabilistic rule learning system optimized to mine instantiated rules through the implementation of a novel two-stage rule generation mechanism. Through experiments, we demonstrate that GPFL not only performs competitively on knowledge graph completion but is also much more efficient then existing methods at mining instantiated rules. With GPFL, we also reveal overfitting instantiated rules and provide detailed analyses about their impact on system performance. Then, we propose RHF, a generic framework for constructing rule hierarchies from a given set of rules. We demonstrate through experiments that with RHF and the hierarchical pruning techniques enabled by it, significant reductions in runtime and rule size are observed due to the pruning of unpromising rules. Eventually, to test the practicability of rule learning systems, we develop Ranta, a novel drug repurposing system that relies on logical rules as features to make interpretable inferences. Ranta outperforms existing methods by a large margin in predictive performance and can make reasonable repurposing suggestions with interpretable evidence

    Towards Personalized Medicine: Computational Approaches to Support Drug Design and Clinical Decision Making

    Get PDF
    The future looks bright for a clinical practice that tailors the therapy with the best efficacy and highest safety to a patient. Substantial amounts of funding have resulted in technological advances regarding patient-centered data acquisition --- particularly genetic data. Yet, the challenge of translating this data into clinical practice remains open. To support drug target characterization, we developed a global maximum entropy-based method that predicts protein-protein complexes including the three-dimensional structure of their interface from sequence data. To further speed up the drug development process, we present methods to reposition drugs with established safety profiles to new indications leveraging paths in cellular interaction networks. We validated both methods on known data, demonstrating their ability to recapitulate known protein complexes and drug-indication pairs, respectively. After studying the extent and characteristics of genetic variation with a predicted impact on protein function across 60,607 individuals, we showed that most patients carry variants in drug-related genes. However, for the majority of variants, their impact on drug efficacy remains unknown. To inform personalized treatment decisions, it is thus crucial to first collate knowledge from open data sources about known variant effects and to then close the knowledge gaps for variants whose effect on drug binding is still not characterized. Here, we built an automated annotation pipeline for patient-specific variants whose value we illustrate for a set of patients with hepatocellular carcinoma. We further developed a molecular modeling protocol to predict changes in binding affinity in proteins with genetic variants which we evaluated for several clinically relevant protein kinases. Overall, we expect that each presented method has the potential to advance personalized medicine by closing knowledge gaps about protein interactions and genetic variation in drug-related genes. To reach clinical applicability, challenges with data availability need to be overcome and prediction performance should be validated experimentally.Therapien mit der besten Wirksamkeit und höchsten Sicherheit werden in Zukunft auf den Patienten zugeschnitten werden. Hier haben erhebliche finanzielle Mittel zu technologischen Fortschritten bei der patientenzentrierten Datenerfassung geführt, aber diese Daten in die klinische Praxis zu übertragen, bleibt aktuell noch eine Herausforderung. Um die Wirkstoffforschung in der Charakterisierung therapeutischer Zielproteine zu unterstützen, haben wir eine Maximum-Entropie-Methode entwickelt, die Protein-Interaktionen und ihre dreidimensionalen Struktur aus Sequenzdaten vorhersagt. Darüber hinaus, stellen wir Methoden zur Repositionierung von etablierten Arzneimitteln auf neue Indikationen vor, die Pfade in zellulären Interaktionsnetze nutzen. Diese Methoden haben wir anhand bekannter Daten validiert und ihre Fähigkeit demonstriert, bekannte Proteinkomplexe bzw. Wirkstoff-Indikations-Paare zu rekapitulieren. Unsere Analyse genetischer Variation mit einem Einfluss auf die Proteinfunktion in 60,607 Individuen konnte zeigen, dass nahezu jeder Patient funktionsverändernde Varianten in Medikamenten-assoziierten Genen trägt. Der direkte Einfluss der meisten beobachteten Varianten auf die Medikamenten-Wirksamkeit ist jedoch noch unbekannt. Um dennoch personalisierte Behandlungsentscheidungen treffen zu können, präsentieren wir eine Annotationspipeline für genetische Varianten, deren Wert wir für Patienten mit hepatozellulärem Karzinom illustrieren konnten. Darüber hinaus haben wir ein molekulares Modellierungsprotokoll entwickelt, um die Veränderungen in der Bindungsaffinität von Proteinen mit genetischen Varianten voraussagen. Insgesamt sind wir davon überzeugt, dass jede der vorgestellten Methoden das Potential hat, Wissenslücken über Proteininteraktionen und genetische Variationen in medikamentenbezogenen Genen zu schlie{\ss}en und somit das Feld der personalisierten Medizin voranzubringen. Um klinische Anwendbarkeit zu erreichen, gilt es in der Zukunft, verbleibende Herausforderungen bei der Datenverfügbarkeit zu bewältigen und unsere Vorhersagen experimentell zu validieren
    corecore