14 research outputs found
KnetMiner - An integrated data platform for gene mining and biological knowledge discovery
Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: Universität Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and ‘omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans.
KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence
Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach
Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases
Systems approaches to drug repositioning
PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased
biomedical knowledge and evolving approaches to Research and Development (R&D).
One complementary approach to drug discovery is that of drug repositioning which
focusses on identifying novel uses for existing drugs. By focussing on existing drugs
that have already reached the market, drug repositioning has the potential to both
reduce the timeframe and cost of getting a disease treatment to those that need it.
Many marketed examples of repositioned drugs have been found via serendipitous or
rational observations, highlighting the need for more systematic methodologies.
Systems approaches have the potential to enable the development of novel methods to
understand the action of therapeutic compounds, but require an integrative approach
to biological data. Integrated networks can facilitate systems-level analyses by combining
multiple sources of evidence to provide a rich description of drugs, their targets and
their interactions. Classically, such networks can be mined manually where a skilled
person can identify portions of the graph that are indicative of relationships between
drugs and highlight possible repositioning opportunities. However, this approach is
not scalable. Automated procedures are required to mine integrated networks systematically
for these subgraphs and bring them to the attention of the user. The aim
of this project was the development of novel computational methods to identify new
therapeutic uses for existing drugs (with particular focus on active small molecules)
using data integration.
A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning
Network Integration Framework (DReNInF) was developed as part of this
work. This framework includes a high-level ontology, Drug Repositioning Network
Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite
of parsers; and a generic semantic graph integration platform. This framework enables
the production of integrated networks maintaining strict semantics that are important
in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description
Framework (RDF) dataset. A Web-based front end was developed, which includes
a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this
dataset.
To automate the mining of drug repositioning datasets, a formal framework for the
definition of semantic subgraphs was established and a method for Drug Repositioning
Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining
semantically-rich networks for occurrences of a given semantic subgraph. This algorithm
allows instances of complex semantic subgraphs that contain data about putative
drug repositioning opportunities to be identified in a computationally tractable
fashion, scaling close to linearly with network data.
The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated.
9,643,061 putative D-T interactions were identified and ranked, with a strong
correlation between highly scored associations and those supported by literature observed.
The 20 top ranked associations were analysed in more detail with 14 found
to be novel and six found to be supported by the literature. It was also shown that
this approach better prioritises known D-T interactions, than other state-of-the-art
methodologies.
The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also
investigated. As target-based approaches are utilised heavily in the field of drug discovery,
it is necessary to have a systematic method to rank Gene-Disease (G-D) associations.
Although methods already exist to collect, integrate and score these associations,
these scores are often not a reliable re
flection of expert knowledge. Therefore, an
integrated data-driven approach to drug repositioning was developed using a Bayesian
statistics approach and applied to rank 309,885 G-D associations using existing knowledge.
Ranked associations were then integrated with other biological data to produce
a semantically-rich drug discovery network. Using this network it was shown that
diseases of the central nervous system (CNS) provide an area of interest. The network
was then systematically mined for semantic subgraphs that capture novel Dr-D relations.
275,934 Dr-D associations were identified and ranked, with those more likely to
be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within
the field of drug repositioning. DReNIn, for example, includes data that previous
comparable datasets relevant to drug repositioning have neglected, such as clinical
trial data and drug indications. Furthermore, the dataset may be easily extended
using DReNInF to include future data as and when it becomes available, such as G-D
association directionality (i.e. is the mutation a loss-of-function or gain-of-function).
Unlike other algorithms and approaches developed for drug repositioning, DReSMin
can be used to infer any types of associations captured in the target semantic network.
Moreover, the approaches presented here should be more generically applicable to
other fields that require algorithms for the integration and mining of semantically rich
networks.European and Physical Sciences Research Council (EPSRC) and GS
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Evolutionary Genomics
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Literature mining and network analysis in Biology
Η παρούσα διπλωματική παρουσιάζει το OnTheFly2.0, ένα διαδικτυακό εργαλείο που επικεντρώνεται στην εξαγωγή και επακόλουθη ανάλυση βιοϊατρικών όρων από μεμονωμένα αρχεία. Συγκεκριμένα, το OnTheFly2.0 υποστηρίζει πολλούς διαφορετικούς επιτρέποντας τον παράλληλο χειρισμό τους. Μέσω της ενσωμάτωσης της υπηρεσίας EXTRACT υλοποιείται η Αναγνώριση Ονοματικών Οντοτήτων (Named Entity Recognition) για γονίδια/πρωτεΐνες, χημικές ουσίες, οργανισμούς, ιστούς, περιβάλλοντα, ασθένειες, φαινοτύπους και όρους οντολογίας γονιδίων (Gene Ontology terms), καθώς και η δημιουργία αναδυόμενων παραθύρων που παρέχουν πληροφορίες για τον αναγνωρισμένο όρο, συνοδευόμενες από σύνδεσμο για διάφορες βάσεις δεδομένων. Οι αναγνωρισμένες πρωτεΐνες, τα γονίδια και οι χημικές ουσίες μπορούν να επεξεργαστούν περαιτέρω μέσω αναλύσεων εμπλουτισμού για τη λειτουργικότητα και τη βιβλιογραφία ή να συσχετιστούν με ασθένειες και πρωτεϊνικές δομές. Τέλος, είναι δυνατή η απεικόνιση αλληλεπιδράσεων μεταξύ πρωτεϊνών ή μεταξύ πρωτεϊνών και χημικών ουσιών μέσω της δημιουργίας διαδραστικών δικτύων από τις βάσεις STRING και STITCH αντίστοιχα. Το OnTheFly2.0 υποστηρίζει 197 διαφορετικά είδη οργανισμών και είναι διαθέσιμο στον παρακάτω σύνδεσμο: http://onthefly.pavlopouloslab.info.The particular thesis presents OnTheFly2.0, a web-based, versatile tool dedicated to the extraction and subsequent analysis of biomedical terms from individual files. More specifically, OnTheFly2.0 supports different file formats, enabling simultaneous file handling. The integration of the EXTRACT tagging service allows the implementation of Named Entity Recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and Gene Ontology terms, as well as the generation of popup windows which provide concise, context related information about the identified term, accompanied by links to various databases. Once named entities, such as proteins, genes and chemicals are identified, they can be further explored via functional and publication enrichment analysis or be associated with diseases and protein domains reporting from protein family databases. Finally, visualization of protein-protein and protein-chemical associations is possible through the generation of interactive networks from the STRING and STITCH services, respectively. OnTheFly2.0 currently supports 197 species and is available at http://onthefly.pavlopouloslab.info
Weighting indirect relations to elucidate the direct association of SNP-disease by use of SPARQL queries
BioMed Central;et al.;Journal of Biomedial Semantics;National Centre for Biomedical Ontology;Scottish Informatics and Computer Science Alliance;University of Edinburgh Informatics6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013 -- 10 December 2013 through 10 December 2013 -- 108622One of the current issues in the bioinformatics domain is to identify genomic variations underlying the complex diseases. There are millions of genetic variations as well as environmental factors that may cause human diseases. Semantic web interlinks diverse data that may reveal many hidden relations and can be utilized for personalized medicine. This requires discovering relationships between phenotypes and genotypes, to answer how the genotype of an individual affects his/her health. Additionally, through identification of genomic variations based on an individual's genotype we can predict the response to a selected drug therapy and accordingly suggest treatment or drug regimes. A personalized medicine knowledgebase can interlink genotypic variations and its possible somatic changes that effects drug targets to pick best treatment and drug regimens for individuals. Such a knowledgebase may help to identify the factors that best explain the association between genotype and phenotype. We've used SPARQL queries to weight factors which link the genotype and phenotype via indirect relationships, and the paths of relationships. A personalized medicine knowledgebase build with the presented approach can interlink genotypic variations and its possible somatic changes that effects drug targets to pick best treatment and drug regimens for individuals, and may help to identify the factors that best explain the association between genotype and phenotype