59,506 research outputs found
Using Knowledge Graphs to enhance the utility of Curated Document Databases
The research presented in this thesis is directed at the generation, maintenance and query ing of Curated Document Databases (CDDs) stored as literature knowledge graphs. Liter ature knowledge graphs are graphs where the vertices represent documents and concepts; and the edges provided links between concepts, and concepts and documents. The central motivation for the work was to provide CDD administrators with a useful mechanism for creating and maintaining literature knowledge graph represented CDDs, and for end users to utilise them. The central research question is “What are some appropriate techniques that can be used for generating, maintaining and utilizing literature knowledge graphs to support the concept of CDDs?”. The thesis thus addresses three issues associated with literature knowledge graphs: (i) their construction, (ii) their maintenance so that their utility can be continued, and (iii) the querying of such knowledge graphs. With respect to the first issue, the Open Information Extraction for Knowledge Graph Construction (OIE4KGC) approach is proposed founded on the idea of using open information extrac tion. Two open information extraction tools were compared, the RnnOIE tool and the Leolani tool. The RnnOIE tool was found to be effective for generation of triples from clinical trial documents. With respect to the second issue two approaches are proposed for maintaining knowledge graph represented CDDs; the CN approach and the Knowledge Graph And BERT Ranking (GRAB-Rank) approach. The first proposed approach used a feature vector representation; and the second a unique hybrid domain specific document embedding. The hybrid domain-specific document embedding combines a Bidirectional En coder Representations from Transformers embedding with a knowledge graph embedding. This proposed embedding was used for document representation in a LETOR model. The idea was to rank a set of potential documents. The Grab-Rank embedding based LETOR approach was found to be effective. For the third identified issue the standard solution is to represent both the query to be addressed and the documents in the knowledge graph in a manner that will allow the documents to be ranked with respect to the query. The solution proposed for this was to utilize a hybrid embedding for query resolution. Two forms of embedding are utilized for query resolution: (i) a Continuous Bag-Of-Words embedding was combined with graph embedding and (ii) for the second BERT and Sci-BERT em bedding were combined with graph embedding. The evaluation indicates that the CBOW embedding combined with graph embedding was found to be effective
Faithiful Embeddings for EL++ Knowledge Bases
Recently, increasing efforts are put into learning continual representations
for symbolic knowledge bases (KBs). However, these approaches either only embed
the data-level knowledge (ABox) or suffer from inherent limitations when
dealing with concept-level knowledge (TBox), i.e., they cannot faithfully model
the logical structure present in the KBs. We present BoxEL, a geometric KB
embedding approach that allows for better capturing the logical structure
(i.e., ABox and TBox axioms) in the description logic EL++. BoxEL models
concepts in a KB as axis-parallel boxes that are suitable for modeling concept
intersection, entities as points inside boxes, and relations between
concepts/entities as affine transformations. We show theoretical guarantees
(soundness) of BoxEL for preserving logical structure. Namely, the learned
model of BoxEL embedding with loss 0 is a (logical) model of the KB.
Experimental results on (plausible) subsumption reasonings and a real-world
application for protein-protein prediction show that BoxEL outperforms
traditional knowledge graph embedding methods as well as state-of-the-art EL++
embedding approaches.Comment: Published in ISWC'2
Development of a Knowledge Graph Embeddings Model for Pain
Pain is a complex concept that can interconnect with other concepts such as a
disorder that might cause pain, a medication that might relieve pain, and so
on. To fully understand the context of pain experienced by either an individual
or across a population, we may need to examine all concepts related to pain and
the relationships between them. This is especially useful when modeling pain
that has been recorded in electronic health records. Knowledge graphs represent
concepts and their relations by an interlinked network, enabling semantic and
context-based reasoning in a computationally tractable form. These graphs can,
however, be too large for efficient computation. Knowledge graph embeddings
help to resolve this by representing the graphs in a low-dimensional vector
space. These embeddings can then be used in various downstream tasks such as
classification and link prediction. The various relations associated with pain
which are required to construct such a knowledge graph can be obtained from
external medical knowledge bases such as SNOMED CT, a hierarchical systematic
nomenclature of medical terms. A knowledge graph built in this way could be
further enriched with real-world examples of pain and its relations extracted
from electronic health records. This paper describes the construction of such
knowledge graph embedding models of pain concepts, extracted from the
unstructured text of mental health electronic health records, combined with
external knowledge created from relations described in SNOMED CT, and their
evaluation on a subject-object link prediction task. The performance of the
models was compared with other baseline models.Comment: Accepted at AMIA 2023, New Orlean
Knowledge Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural Graph Forgetting Knowledge Tracing
Recently, knowledge tracing models have been applied in educational data
mining such as the Self-attention knowledge tracing model(SAKT), which models
the relationship between exercises and Knowledge concepts(Kcs). However,
relation modeling in traditional Knowledge tracing models only considers the
static question-knowledge relationship and knowledge-knowledge relationship and
treats these relationships with equal importance. This kind of relation
modeling is difficult to avoid the influence of subjective labeling and
considers the relationship between exercises and KCs, or KCs and KCs
separately. In this work, a novel knowledge tracing model, named Knowledge
Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural
Graph Forgetting Knowledge Tracing(NGFKT), is proposed to reduce the impact of
the subjective labeling by calibrating the skill relation matrix and the
Q-matrix and apply the Graph Convolutional Network(GCN) to model the
heterogeneous interactions between students, exercises, and skills.
Specifically, the skill relation matrix and Q-matrix are generated by the
Knowledge Relation Importance Rank Calibration method(KRIRC). Then the
calibrated skill relation matrix, Q-matrix, and the heterogeneous interactions
are treated as the input of the GCN to generate the exercise embedding and
skill embedding. Next, the exercise embedding, skill embedding, item
difficulty, and contingency table are incorporated to generate an exercise
relation matrix as the inputs of the Position-Relation-Forgetting attention
mechanism. Finally, the Position-Relation-Forgetting attention mechanism is
applied to make the predictions. Experiments are conducted on the two public
educational datasets and results indicate that the NGFKT model outperforms all
baseline models in terms of AUC, ACC, and Performance Stability(PS).Comment: 11 pages, 3 figure
Contextualized Structural Self-supervised Learning for Ontology Matching
Ontology matching (OM) entails the identification of semantic relationships
between concepts within two or more knowledge graphs (KGs) and serves as a
critical step in integrating KGs from various sources. Recent advancements in
deep OM models have harnessed the power of transformer-based language models
and the advantages of knowledge graph embedding. Nevertheless, these OM models
still face persistent challenges, such as a lack of reference alignments,
runtime latency, and unexplored different graph structures within an end-to-end
framework. In this study, we introduce a novel self-supervised learning OM
framework with input ontologies, called LaKERMap. This framework capitalizes on
the contextual and structural information of concepts by integrating implicit
knowledge into transformers. Specifically, we aim to capture multiple
structural contexts, encompassing both local and global interactions, by
employing distinct training objectives. To assess our methods, we utilize the
Bio-ML datasets and tasks. The findings from our innovative approach reveal
that LaKERMap surpasses state-of-the-art systems in terms of alignment quality
and inference time. Our models and codes are available here:
https://github.com/ellenzhuwang/lakermap
- …