6 research outputs found
ProphNet: A generic prioritization method through propagation of information
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 1, 2014: Integrated Bio-Search: Selected Works from the 12th International Workshop on Network Tools and Applications in Biology (NETTAB 2012).[Background]
Prioritization methods have become an useful tool for mining large amounts of data to suggest promising hypotheses in early research stages. Particularly, network-based prioritization tools use a network representation for the interactions between different biological entities to identify novel indirect relationships. However, current network-based prioritization tools are strongly tailored to specific domains of interest (e.g. gene-disease prioritization) and they do not allow to consider networks with more than two types of entities (e.g. genes and diseases). Therefore, the direct application of these methods to accomplish new prioritization tasks is limited.[Results]
This work presents ProphNet, a generic network-based prioritization tool that allows to integrate an arbitrary number of interrelated biological entities to accomplish any prioritization task. We tested the performance of ProphNet in comparison with leading network-based prioritization methods, namely rcNet and DomainRBF, for gene-disease and domain-disease prioritization, respectively. The results obtained by ProphNet show a significant improvement in terms of sensitivity and specificity for both tasks. We also applied ProphNet to disease-gene prioritization on Alzheimer, Diabetes Mellitus Type 2 and Breast Cancer to validate the results and identify putative candidate genes involved in these diseases.[Conclusions]
ProphNet works on top of any heterogeneous network by integrating information of different types of biological entities to rank entities of a specific type according to their degree of relationship with a query set of entities of another type. Our method works by propagating information across data networks and measuring the correlation between the propagated values for a query and a target sets of entities. ProphNet is available at: http://genome2.ugr.es/prophnet webcite. A Matlab implementation of the algorithm is also available at the website.This work was part of projects P08-TIC-4299 of J. A., Sevilla and TIN2009-13489 of DGICT, Madrid. It was also supported by Plan Propio de Investigación, University of Granada
Novel therapeutics for complex diseases from genome-wide association data
The development of novel therapies is essential to lower the burden of complex diseases. The purpose of this study is to identify novel therapeutics for complex diseases using bioinformatic methods. Bioinformatic tools such as candidate gene prediction tools allow identification of disease genes by identifying the potential candidate genes linked to genetic markers of the disease. Candidate gene prediction tools can only identify candidates for further research, and do not identify disease genes directly. Integration of drug-target datasets with candidate gene data-sets can identify novel potential therapeutics suitable for repositioning in clinical trials. Drug repositioning can save valuable time and money spent in therapeutic development of complex diseases
Data-driven knowledge discovery in polycystic kidney disease
The use of data derived from genomics and transcriptomic to further develop our understanding of Polycystic Kidney Diseases and identify novel drugs for its treatment.LUMC / Geneeskund
Investigation of vertex centralities in human gene-disease networks
Studying associations among genes and diseases provides an important avenue for
a better understanding of genetic-related disorders, phenotypes and other complex
diseases. Research has shown that many complex human diseases cannot be attributed
to a particular gene, but a set of interacting genes. The effect of a specific
gene on multiple diseases is called pleiotropy and interactions among several genes
to contribute to a specific disease is called epistasis. In addition, many human genetic
disorders and diseases are known to be related to each other through frequently
observed co-occurrences. Studying the correlations among multiple diseases helps us
better understand the common genetic background of diseases and develop new drugs
that can treat them more effectively and avoid side effects. Meanwhile, network science
has seen an increase in applications to model complex biological systems, and
can be a powerful tool to elucidate the correlations of multiple human diseases as
well as interactions among associated genes. In this thesis, known disease-gene associations
are represented using a weighted bipartite network. Subsequently, two new
networks are extracted. One is the weighted human disease network to show the
correlations of diseases, and the other is the weighted gene network to capture the
interactions among genes. We propose two new centrality measures for the weighted
human disease network and the weighted gene network. We evaluate our centrality
measurements and compare them with the most commonly used centralities in biological
networks including degree, closeness, and betweenness. The results show that
our new centrality methods can find more important vertices since the removal of
the top-ranked vertices leads to a higher decline rate of the network efficiency. Our identified key diseases and genes hold the potential of helping better understand the
genetic background and etiologies of complex human diseases
Learning Logical Rules from Knowledge Graphs
Ph.D. (Integrated) ThesisExpressing and extracting regularities in multi-relational data, where data points are interrelated
and heterogeneous, requires well-designed knowledge representation. Knowledge Graphs (KGs),
as a graph-based representation of multi-relational data, have seen a rapidly growing presence in
industry and academia, where many real-world applications and academic research are either
enabled or augmented through the incorporation of KGs. However, due to the way KGs are
constructed, they are inherently noisy and incomplete. In this thesis, we focus on developing
logic-based graph reasoning systems that utilize logical rules to infer missing facts for the
completion of KGs. Unlike most rule learners that primarily mine abstract rules that contain
no constants, we are particularly interested in learning instantiated rules that contain constants
due to their ability to represent meaningful patterns and correlations that can not be expressed
by abstract rules. The inclusion of instantiated rules often leads to exponential growth in the
search space. Therefore, it is necessary to develop optimization strategies to balance between
scalability and expressivity. To such an end, we propose GPFL, a probabilistic rule learning
system optimized to mine instantiated rules through the implementation of a novel two-stage
rule generation mechanism. Through experiments, we demonstrate that GPFL not only performs
competitively on knowledge graph completion but is also much more efficient then existing
methods at mining instantiated rules. With GPFL, we also reveal overfitting instantiated rules
and provide detailed analyses about their impact on system performance. Then, we propose RHF,
a generic framework for constructing rule hierarchies from a given set of rules. We demonstrate
through experiments that with RHF and the hierarchical pruning techniques enabled by it,
significant reductions in runtime and rule size are observed due to the pruning of unpromising
rules. Eventually, to test the practicability of rule learning systems, we develop Ranta, a novel
drug repurposing system that relies on logical rules as features to make interpretable inferences.
Ranta outperforms existing methods by a large margin in predictive performance and can make
reasonable repurposing suggestions with interpretable evidence
Towards Personalized Medicine: Computational Approaches to Support Drug Design and Clinical Decision Making
The future looks bright for a clinical practice that tailors the
therapy with the best efficacy and highest safety to a patient. Substantial
amounts of funding have resulted in technological advances regarding
patient-centered data acquisition --- particularly genetic data. Yet, the
challenge of translating this data into clinical practice remains open.
To support drug target characterization, we developed a global maximum
entropy-based method that predicts protein-protein complexes including the
three-dimensional structure of their interface from sequence data. To further
speed up the drug development process, we present methods to reposition drugs
with established safety profiles to new indications leveraging paths in
cellular interaction networks. We validated both methods on known data,
demonstrating their ability to recapitulate known protein complexes and
drug-indication pairs, respectively.
After studying the extent and characteristics of genetic variation with a
predicted impact on protein function across 60,607 individuals, we showed that
most patients carry variants in drug-related genes. However, for the majority
of variants, their impact on drug efficacy remains unknown. To inform
personalized treatment decisions, it is thus crucial to first collate knowledge
from open data sources about known variant effects and to then close the
knowledge gaps for variants whose effect on drug binding is still not
characterized. Here, we built an automated annotation pipeline for
patient-specific variants whose value we illustrate for a set of patients with
hepatocellular carcinoma. We further developed a molecular modeling protocol to
predict changes in binding affinity in proteins with genetic variants which we
evaluated for several clinically relevant protein kinases.
Overall, we expect that each presented method has the potential to advance
personalized medicine by closing knowledge gaps about protein interactions and
genetic variation in drug-related genes. To reach clinical applicability,
challenges with data availability need to be overcome and prediction
performance should be validated experimentally.Therapien mit der besten Wirksamkeit und höchsten
Sicherheit werden in Zukunft auf den Patienten zugeschnitten werden. Hier haben
erhebliche finanzielle Mittel zu technologischen Fortschritten bei der
patientenzentrierten Datenerfassung geführt, aber diese Daten in die
klinische Praxis zu übertragen, bleibt aktuell noch eine Herausforderung.
Um die Wirkstoffforschung in der Charakterisierung therapeutischer Zielproteine
zu unterstützen, haben wir eine Maximum-Entropie-Methode entwickelt,
die Protein-Interaktionen und ihre dreidimensionalen Struktur
aus Sequenzdaten vorhersagt. Darüber hinaus, stellen wir Methoden
zur Repositionierung von etablierten Arzneimitteln auf
neue Indikationen vor, die Pfade in zellulären Interaktionsnetze nutzen.
Diese Methoden haben wir anhand bekannter Daten validiert und ihre Fähigkeit
demonstriert, bekannte Proteinkomplexe bzw. Wirkstoff-Indikations-Paare zu
rekapitulieren.
Unsere Analyse genetischer Variation mit einem Einfluss auf die
Proteinfunktion in 60,607 Individuen konnte zeigen, dass nahezu jeder Patient
funktionsverändernde Varianten in Medikamenten-assoziierten Genen
trägt. Der direkte Einfluss der meisten beobachteten Varianten auf die
Medikamenten-Wirksamkeit ist jedoch noch unbekannt. Um dennoch personalisierte
Behandlungsentscheidungen treffen zu können, präsentieren wir eine Annotationspipeline für genetische
Varianten, deren Wert wir für Patienten mit hepatozellulärem
Karzinom illustrieren konnten. Darüber hinaus haben wir ein molekulares
Modellierungsprotokoll entwickelt, um die Veränderungen in der
Bindungsaffinität von Proteinen mit genetischen Varianten voraussagen.
Insgesamt sind wir davon überzeugt, dass jede der vorgestellten Methoden das
Potential hat, Wissenslücken über Proteininteraktionen und
genetische Variationen in medikamentenbezogenen Genen zu schlie{\ss}en und
somit das Feld der personalisierten Medizin voranzubringen. Um klinische
Anwendbarkeit zu erreichen, gilt es in der Zukunft, verbleibende
Herausforderungen bei der Datenverfügbarkeit zu bewältigen und unsere
Vorhersagen experimentell zu validieren