1,197 research outputs found

    An effective disease risk indicator tool

    Get PDF
    Each mixture of deficient molecular families of a specific disease induces the disease at a different time frame in the future. Based on this, we propose a novel methodology for personalizing a person’s level of future susceptibility to a specific disease by inferring the mixture of his/her molecular families, whose combined deficiencies is likely to induce the disease. We implemented the methodology in a working system called DRIT, which consists of the following components: logic inferencer, information extractor, risk indicator, and interrelationship between molecular families modeler. The information extractor takes advantage of the exponential increase of biomedical literature to extract the common biomarkers that test positive among most patients with a specific disease. The logic inferencer transforms the hierarchical interrelationships between the molecular families of a disease into rule-based specifications. The interrelationship between molecular families modeler models the hierarchical interrelationships between the molecular families, whose biomarkers were extracted by the information extractor. It employs the specification rules and the inference rules for predicate logic to infer as many as possible probable deficient molecular families for a person based on his/her few molecular families, whose biomarkers tested positive by medical screening. The risk indicator outputs a risk indicator value that reflects a person’s level of future susceptibility to the disease. We evaluated DRIT by comparing it experimentally with a comparable method. Results revealed marked improvement

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

    Get PDF
    To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

    A sequence-length sensitive approach to learning biological grammars using inductive logic programming.

    Get PDF
    This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

    Personizing the prediction of future susceptibility to a specific disease

    Get PDF
    A traceable biomarker is a member of a disease’s molecular pathway. A disease may be associated with several molecular pathways. Each different combination of these molecular pathways, to which detected traceable biomarkers belong, may serve as an indicative of the elicitation of the disease at a different time frame in the future. Based on this notion, we introduce a novel methodology for personalizing an individual’s degree of future susceptibility to a specific disease. We implemented the methodology in a working system called Susceptibility Degree to a Disease Predictor (SDDP). For a specific disease d, let S be the set of molecular pathways, to which traceable biomarkers detected from most patients of d belong. For the same disease d, let S′ be the set of molecular pathways, to which traceable biomarkers detected from a certain individual belong. SDDP is able to infer the subset S′′ ⊆{S-S′} of undetected molecular pathways for the individual. Thus, SDDP can infer undetected molecular pathways of a disease for an individual based on few molecular pathways detected from the individual. SDDP can also help in inferring the combination of molecular pathways in the set {S′+S′′}, whose traceable biomarkers collectively is an indicative of the disease. SDDP is composed of the following four components: information extractor, interrelationship between molecular pathways modeler, logic inferencer, and risk indicator. The information extractor takes advantage of the exponential increase of biomedical literature to automatically extract the common traceable biomarkers for a specific disease. The interrelationship between molecular pathways modeler models the hierarchical interrelationships between the molecular pathways of the traceable biomarkers. The logic inferencer transforms the hierarchical interrelationships between the molecular pathways into rule-based specifications. It employs the specification rules and the inference rules for predicate logic to infer as many as possible undetected molecular pathways of a disease for an individual. The risk indicator outputs a risk indicator value that reflects the individual’s degree of future susceptibility to the disease. We evaluated SDDP by comparing it experimentally with other methods. Results revealed marked improvement

    NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language

    Full text link
    Rule-based models are attractive for various tasks because they inherently lead to interpretable and explainable decisions and can easily incorporate prior knowledge. However, such systems are difficult to apply to problems involving natural language, due to its linguistic variability. In contrast, neural models can cope very well with ambiguity by learning distributed representations of words and their composition from data, but lead to models that are difficult to interpret. In this paper, we describe a model combining neural networks with logic programming in a novel manner for solving multi-hop reasoning tasks over natural language. Specifically, we propose to use a Prolog prover which we extend to utilize a similarity function over pretrained sentence encoders. We fine-tune the representations for the similarity function via backpropagation. This leads to a system that can apply rule-based reasoning to natural language, and induce domain-specific rules from training data. We evaluate the proposed system on two different question answering tasks, showing that it outperforms two baselines -- BIDAF (Seo et al., 2016a) and FAST QA (Weissenborn et al., 2017b) on a subset of the WikiHop corpus and achieves competitive results on the MedHop data set (Welbl et al., 2017).Comment: ACL 201
    • …
    corecore