4 research outputs found
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
Biomedical knowledge is growing in an astounding pace with a majority of this
knowledge is represented as scientific publications. Text mining tools and
methods represents automatic approaches for extracting hidden patterns and
trends from this semi structured and unstructured data. In Biomedical Text
mining, Literature Based Discovery (LBD) is the process of automatically
discovering novel associations between medical terms otherwise mentioned in
disjoint literature sets. LBD approaches proven to be successfully reducing the
discovery time of potential associations that are hidden in the vast amount of
scientific literature. The process focuses on creating concept profiles for
medical terms such as a disease or symptom and connecting it with a drug and
treatment based on the statistical significance of the shared profiles. This
knowledge discovery approach introduced in 1989 still remains as a core task in
text mining. Currently the ABC principle based two approaches namely open
discovery and closed discovery are mostly explored in LBD process. This review
starts with general introduction about text mining followed by biomedical text
mining and introduces various literature resources such as MEDLINE, UMLS, MESH,
and SemMedDB. This is followed by brief introduction of the core ABC principle
and its associated two approaches open discovery and closed discovery in LBD
process. This review also discusses the deep learning applications in LBD by
reviewing the role of transformer models and neural networks based LBD models
and its future aspects. Finally, reviews the key biomedical discoveries
generated through LBD approaches in biomedicine and conclude with the current
limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table
Exploiting Latent Features of Text and Graphs
As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation
In Search of a Common Thread: Enhancing the LBD Workflow with a view to its Widespread Applicability
Literature-Based Discovery (LBD) research focuses on discovering implicit knowledge
linkages in existing scientific literature to provide impetus to innovation and research
productivity. Despite significant advancements in LBD research, previous studies contain
several open problems and shortcomings that are hindering its progress. The overarching
goal of this thesis is to address these issues, not only to enhance the discovery
component of LBD, but also to shed light on new directions that can further strengthen
the existing understanding of the LBD work
ow. In accordance with this goal, the thesis
aims to enhance the LBD work
ow with a view to ensuring its widespread applicability.
The goal of widespread applicability is twofold. Firstly, it relates to the adaptability of
the proposed solutions to a diverse range of problem settings. These problem settings
are not necessarily application areas that are closely related to the LBD context, but
could include a wide range of problems beyond the typical scope of LBD, which has traditionally
been applied to scientific literature. Adapting the LBD work
ow to problems
outside the typical scope of LBD is a worthwhile goal, since the intrinsic objective of
LBD research, which is discovering novel linkages in text corpora is valid across a vast
range of problem settings.
Secondly, the idea of widespread applicability also denotes the capability of the proposed
solutions to be executed in new environments. These `new environments' are various
academic disciplines (i.e., cross-domain knowledge discovery) and publication languages
(i.e., cross-lingual knowledge discovery). The application of LBD models to new environments
is timely, since the massive growth of the scientific literature has engendered
huge challenges to academics, irrespective of their domain.
This thesis is divided into five main research objectives that address the following topics:
literature synthesis, the input component, the discovery component, reusability, and
portability. The objective of the literature synthesis is to address the gaps in existing
LBD reviews by conducting the rst systematic literature review. The input component
section aims to provide generalised insights on the suitability of various input types in the
LBD work
ow, focusing on their role and potential impact on the information retrieval
cycle of LBD.
The discovery component section aims to intermingle two research directions that have
been under-investigated in the LBD literature, `modern word embedding techniques'
and `temporal dimension' by proposing diachronic semantic inferences. Their potential
positive in
uence in knowledge discovery is veri ed through both direct and indirect
uses. The reusability section aims to present a new, distinct viewpoint on these LBD
models by verifying their reusability in a timely application area using a methodical reuse
plan. The last section, portability, proposes an interdisciplinary LBD framework that
can be applied to new environments. While highly cost-e cient and easily pluggable, this framework also gives rise to a new perspective on knowledge discovery through its
generalisable capabilities.
Succinctly, this thesis presents novel and distinct viewpoints to accomplish five main
research objectives, enhancing the existing understanding of the LBD work
ow. The
thesis offers new insights which future LBD research could further explore and expand
to create more eficient, widely applicable LBD models to enable broader community
benefits.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202