536 research outputs found

    Provenance-Centered Dataset of Drug-Drug Interactions

    Get PDF
    Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Concept graphs: Applications to biomedical text categorization and concept extraction

    Get PDF
    As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

    Utilizing Knowledge Bases In Information Retrieval For Clinical Decision Support And Precision Medicine

    Get PDF
    Accurately answering queries that describe a clinical case and aim at finding articles in a collection of medical literature requires utilizing knowledge bases in capturing many explicit and latent aspects of such queries. Proper representation of these aspects needs knowledge-based query understanding methods that identify the most important query concepts as well as knowledge-based query reformulation methods that add new concepts to a query. In the tasks of Clinical Decision Support (CDS) and Precision Medicine (PM), the query and collection documents may have a complex structure with different components, such as disease and genetic variants that should be transformed to enable an effective information retrieval. In this work, we propose methods for representing domain-specific queries based on weighted concepts of different types whether exist in the query itself or extracted from the knowledge bases and top retrieved documents. Besides, we propose an optimization framework, which allows unifying query analysis and expansion by jointly determining the importance weights for the query and expansion concepts depending on their type and source. We also propose a probabilistic model to reformulate the query given genetic information in the query and collection documents. We observe significant improvement of retrieval accuracy will be obtained for our proposed methods over state-of-the-art baselines for the tasks of clinical decision support and precision medicine

    Integrating Medical Ontology and Pseudo Relevance Feedback For Medical Document Retrieval

    Get PDF
    The purpose of this thesis is to undertake and improve the accuracy of locating the relevant documents from a large amount of Electronic Medical Data (EMD). The unique goal of this research is to propose a new idea for using medical ontology to find an easy and more reliable approach for patients to have a better understanding of their diseases and also help doctors to find and further improve the possible methods of diagnosis and treatments. The empirical studies were based on the dataset provided by CLEF focused on health care data. In this research, I have used Information Retrieval to find and obtain relevant information within the large amount of data sets provided by CLEF. I then used ranking functionality on the Terrier platform to calculate and evaluate the matching documents in the collection of data sets. BM25 was used as the base normalization method to retrieve the results and Pseudo Relevance Feedback weighting model to retrieve the information regarding patients health history and medical records in order to find more accurate results. I then used Unified Medical Language System to develop indexing of the queries while searching on the Internet and looking for health related documents. UMLS software was actually used to link the computer system with the health and biomedical terms and vocabularies into classify tools; it works as a dictionary for the patients by translating the medical terms. Later I would like to work on using medical ontology to create a relationship between the documents regarding the medical data and my retrieved results

    Learning Conceptual-Contextual Embeddings for Medical Text

    Full text link
    External knowledge is often useful for natural language understanding tasks. We introduce a contextual text representation model called Conceptual-Contextual (CC) embeddings, which incorporates structured knowledge into text representations. Unlike entity embedding methods, our approach encodes a knowledge graph into a context model. CC embeddings can be easily reused for a wide range of tasks just like pre-trained language models. Our model effectively encodes the huge UMLS database by leveraging semantic generalizability. Experiments on electronic health records (EHRs) and medical text processing benchmarks showed our model gives a major boost to the performance of supervised medical NLP tasks

    Advanced Methods for Entity Linking in the Life Sciences

    Get PDF
    The amount of knowledge increases rapidly due to the increasing number of available data sources. However, the autonomy of data sources and the resulting heterogeneity prevent comprehensive data analysis and applications. Data integration aims to overcome heterogeneity by unifying different data sources and enriching unstructured data. The enrichment of data consists of different subtasks, amongst other the annotation process. The annotation process links document phrases to terms of a standardized vocabulary. Annotated documents enable effective retrieval methods, comparability of different documents, and comprehensive data analysis, such as finding adversarial drug effects based on patient data. A vocabulary allows the comparability using standardized terms. An ontology can also represent a vocabulary, whereas concepts, relationships, and logical constraints additionally define an ontology. The annotation process is applicable in different domains. Nevertheless, there is a difference between generic and specialized domains according to the annotation process. This thesis emphasizes the differences between the domains and addresses the identified challenges. The majority of annotation approaches focuses on the evaluation of general domains, such as Wikipedia. This thesis evaluates the developed annotation approaches with case report forms that are medical documents for examining clinical trials. The natural language provides different challenges, such as similar meanings using different phrases. The proposed annotation method, AnnoMap, considers the fuzziness of natural language. A further challenge is the reuse of verified annotations. Existing annotations represent knowledge that can be reused for further annotation processes. AnnoMap consists of a reuse strategy that utilizes verified annotations to link new documents to appropriate concepts. Due to the broad spectrum of areas in the biomedical domain, different tools exist. The tools perform differently regarding a particular domain. This thesis proposes a combination approach to unify results from different tools. The method utilizes existing tool results to build a classification model that can classify new annotations as correct or incorrect. The results show that the reuse and the machine learning-based combination improve the annotation quality compared to existing approaches focussing on the biomedical domain. A further part of data integration is entity resolution to build unified knowledge bases from different data sources. A data source consists of a set of records characterized by attributes. The goal of entity resolution is to identify records representing the same real-world entity. Many methods focus on linking data sources consisting of records being characterized by attributes. Nevertheless, only a few methods can handle graph-structured knowledge bases or consider temporal aspects. The temporal aspects are essential to identify the same entities over different time intervals since these aspects underlie certain conditions. Moreover, records can be related to other records so that a small graph structure exists for each record. These small graphs can be linked to each other if they represent the same. This thesis proposes an entity resolution approach for census data consisting of person records for different time intervals. The approach also considers the graph structure of persons given by family relationships. For achieving qualitative results, current methods apply machine-learning techniques to classify record pairs as the same entity. The classification task used a model that is generated by training data. In this case, the training data is a set of record pairs that are labeled as a duplicate or not. Nevertheless, the generation of training data is a time-consuming task so that active learning techniques are relevant for reducing the number of training examples. The entity resolution method for temporal graph-structured data shows an improvement compared to previous collective entity resolution approaches. The developed active learning approach achieves comparable results to supervised learning methods and outperforms other limited budget active learning methods. Besides the entity resolution approach, the thesis introduces the concept of evolution operators for communities. These operators can express the dynamics of communities and individuals. For instance, we can formulate that two communities merged or split over time. Moreover, the operators allow observing the history of individuals. Overall, the presented annotation approaches generate qualitative annotations for medical forms. The annotations enable comprehensive analysis across different data sources as well as accurate queries. The proposed entity resolution approaches improve existing ones so that they contribute to the generation of qualitative knowledge graphs and data analysis tasks
    • …
    corecore