65,086 research outputs found
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Clustering as an Evaluation Protocol for Knowledge Embedding Representation of Categorised Multi-relational Data in the Clinical Domain
Learning knowledge representation is an increasingly important technology
applicable in many domain-specific machine learning problems. We discuss the
effectiveness of traditional Link Prediction or Knowledge Graph Completion
evaluation protocol when embedding knowledge representation for categorised
multi-relational data in the clinical domain. Link prediction uses to split the
data into training and evaluation subsets, leading to loss of information along
training and harming the knowledge representation model accuracy. We propose a
Clustering Evaluation Protocol as a replacement alternative to the
traditionally used evaluation tasks. We used embedding models trained by a
knowledge embedding approach which has been evaluated with clinical datasets.
Experimental results with Pearson and Spearman correlations show strong
evidence that the novel proposed evaluation protocol is pottentially able to
replace link prediction
Using Knowledge Graphs to enhance the utility of Curated Document Databases
The research presented in this thesis is directed at the generation, maintenance and query ing of Curated Document Databases (CDDs) stored as literature knowledge graphs. Liter ature knowledge graphs are graphs where the vertices represent documents and concepts; and the edges provided links between concepts, and concepts and documents. The central motivation for the work was to provide CDD administrators with a useful mechanism for creating and maintaining literature knowledge graph represented CDDs, and for end users to utilise them. The central research question is “What are some appropriate techniques that can be used for generating, maintaining and utilizing literature knowledge graphs to support the concept of CDDs?”. The thesis thus addresses three issues associated with literature knowledge graphs: (i) their construction, (ii) their maintenance so that their utility can be continued, and (iii) the querying of such knowledge graphs. With respect to the first issue, the Open Information Extraction for Knowledge Graph Construction (OIE4KGC) approach is proposed founded on the idea of using open information extrac tion. Two open information extraction tools were compared, the RnnOIE tool and the Leolani tool. The RnnOIE tool was found to be effective for generation of triples from clinical trial documents. With respect to the second issue two approaches are proposed for maintaining knowledge graph represented CDDs; the CN approach and the Knowledge Graph And BERT Ranking (GRAB-Rank) approach. The first proposed approach used a feature vector representation; and the second a unique hybrid domain specific document embedding. The hybrid domain-specific document embedding combines a Bidirectional En coder Representations from Transformers embedding with a knowledge graph embedding. This proposed embedding was used for document representation in a LETOR model. The idea was to rank a set of potential documents. The Grab-Rank embedding based LETOR approach was found to be effective. For the third identified issue the standard solution is to represent both the query to be addressed and the documents in the knowledge graph in a manner that will allow the documents to be ranked with respect to the query. The solution proposed for this was to utilize a hybrid embedding for query resolution. Two forms of embedding are utilized for query resolution: (i) a Continuous Bag-Of-Words embedding was combined with graph embedding and (ii) for the second BERT and Sci-BERT em bedding were combined with graph embedding. The evaluation indicates that the CBOW embedding combined with graph embedding was found to be effective
Deep Learning for the Generation of Heuristics in Answer Set Programming: A Case Study of Graph Coloring
Answer Set Programming (ASP) is a well-established declarative AI formalism for knowledge representation and reasoning. ASP systems were successfully applied to both industrial and academic problems. Nonetheless, their performance can be improved by embedding domain-specific heuristics into their solving process. However, the development of domain-specific heuristics often requires both a deep knowledge of the domain at hand and a good understanding of the fundamental working principles of the ASP solvers. In this paper, we investigate the use of deep learning techniques to automatically generate domain-specific heuristics for ASP solvers targeting the well-known graph coloring problem. Empirical results show that the idea is promising: the performance of the ASP solver wasp can be improved
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports
The way we analyse clinical texts has undergone major changes over the last
years. The introduction of language models such as BERT led to adaptations for
the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on
large databases of archived medical documents. While performing well in terms
of accuracy, both the lack of interpretability and limitations to transfer
across languages limit their use in clinical setting. We introduce a novel
light-weight graph-based embedding method specifically catering radiology
reports. It takes into account the structure and composition of the report,
while also connecting medical terms in the report through the multi-lingual
SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers
the underlying relationships among clinical terms, achieving a representation
that is better understandable for clinicians and clinically more accurate,
without reliance on large pre-training datasets. We show the use of this
embedding on two tasks namely disease classification of X-ray reports and image
classification. For disease classification our model is competitive with its
BERT-based counterparts, while being magnitudes smaller in size and training
data requirements. For image classification, we show the effectiveness of the
graph embedding leveraging cross-modal knowledge transfer and show how this
method is usable across different languages
- …