65,086 research outputs found

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Full text link
    Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

    Clustering as an Evaluation Protocol for Knowledge Embedding Representation of Categorised Multi-relational Data in the Clinical Domain

    Full text link
    Learning knowledge representation is an increasingly important technology applicable in many domain-specific machine learning problems. We discuss the effectiveness of traditional Link Prediction or Knowledge Graph Completion evaluation protocol when embedding knowledge representation for categorised multi-relational data in the clinical domain. Link prediction uses to split the data into training and evaluation subsets, leading to loss of information along training and harming the knowledge representation model accuracy. We propose a Clustering Evaluation Protocol as a replacement alternative to the traditionally used evaluation tasks. We used embedding models trained by a knowledge embedding approach which has been evaluated with clinical datasets. Experimental results with Pearson and Spearman correlations show strong evidence that the novel proposed evaluation protocol is pottentially able to replace link prediction

    Using Knowledge Graphs to enhance the utility of Curated Document Databases

    Get PDF
    The research presented in this thesis is directed at the generation, maintenance and query ing of Curated Document Databases (CDDs) stored as literature knowledge graphs. Liter ature knowledge graphs are graphs where the vertices represent documents and concepts; and the edges provided links between concepts, and concepts and documents. The central motivation for the work was to provide CDD administrators with a useful mechanism for creating and maintaining literature knowledge graph represented CDDs, and for end users to utilise them. The central research question is “What are some appropriate techniques that can be used for generating, maintaining and utilizing literature knowledge graphs to support the concept of CDDs?”. The thesis thus addresses three issues associated with literature knowledge graphs: (i) their construction, (ii) their maintenance so that their utility can be continued, and (iii) the querying of such knowledge graphs. With respect to the first issue, the Open Information Extraction for Knowledge Graph Construction (OIE4KGC) approach is proposed founded on the idea of using open information extrac tion. Two open information extraction tools were compared, the RnnOIE tool and the Leolani tool. The RnnOIE tool was found to be effective for generation of triples from clinical trial documents. With respect to the second issue two approaches are proposed for maintaining knowledge graph represented CDDs; the CN approach and the Knowledge Graph And BERT Ranking (GRAB-Rank) approach. The first proposed approach used a feature vector representation; and the second a unique hybrid domain specific document embedding. The hybrid domain-specific document embedding combines a Bidirectional En coder Representations from Transformers embedding with a knowledge graph embedding. This proposed embedding was used for document representation in a LETOR model. The idea was to rank a set of potential documents. The Grab-Rank embedding based LETOR approach was found to be effective. For the third identified issue the standard solution is to represent both the query to be addressed and the documents in the knowledge graph in a manner that will allow the documents to be ranked with respect to the query. The solution proposed for this was to utilize a hybrid embedding for query resolution. Two forms of embedding are utilized for query resolution: (i) a Continuous Bag-Of-Words embedding was combined with graph embedding and (ii) for the second BERT and Sci-BERT em bedding were combined with graph embedding. The evaluation indicates that the CBOW embedding combined with graph embedding was found to be effective

    Deep Learning for the Generation of Heuristics in Answer Set Programming: A Case Study of Graph Coloring

    Get PDF
    Answer Set Programming (ASP) is a well-established declarative AI formalism for knowledge representation and reasoning. ASP systems were successfully applied to both industrial and academic problems. Nonetheless, their performance can be improved by embedding domain-specific heuristics into their solving process. However, the development of domain-specific heuristics often requires both a deep knowledge of the domain at hand and a good understanding of the fundamental working principles of the ASP solvers. In this paper, we investigate the use of deep learning techniques to automatically generate domain-specific heuristics for ASP solvers targeting the well-known graph coloring problem. Empirical results show that the idea is promising: the performance of the ASP solver wasp can be improved

    Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports

    Full text link
    The way we analyse clinical texts has undergone major changes over the last years. The introduction of language models such as BERT led to adaptations for the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on large databases of archived medical documents. While performing well in terms of accuracy, both the lack of interpretability and limitations to transfer across languages limit their use in clinical setting. We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report through the multi-lingual SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers the underlying relationships among clinical terms, achieving a representation that is better understandable for clinicians and clinically more accurate, without reliance on large pre-training datasets. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification. For disease classification our model is competitive with its BERT-based counterparts, while being magnitudes smaller in size and training data requirements. For image classification, we show the effectiveness of the graph embedding leveraging cross-modal knowledge transfer and show how this method is usable across different languages
    • …
    corecore