3,991 research outputs found

    An Automated Method to Enrich and Expand Consumer Health Vocabularies Using GloVe Word Embeddings

    Get PDF
    Clear language makes communication easier between any two parties. However, a layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon, which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this dissertation, we present an automatic method to enrich existing concepts in a medical ontology with additional laymen terms and also to expand the number of concepts in the ontology that do not have associated laymen terms. Our work has the benefit of being applicable to vocabularies in any domain. Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. We improve these vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. By performing iterative feedback using GloVe’s candidate terms, we can boost the number of word occurrences in the co-occurrence matrix allowing our approach to work with a smaller training corpus. Our novel algorithms and GloVe were evaluated using two laymen datasets from the National Library of Medicine (NLM), the Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV) and the MedlinePlus Healthcare Vocabulary. For our first goal, enriching concepts, the results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Our best algorithm enhanced the corpus with synonyms from WordNet, outperformed GloVe with an F-score relative improvement of 25%. For our second goal, expanding the number of concepts with related laymen’s terms, our synonym-enhanced GloVe outperformed GloVe with a relative F-score relative improvement of 63%. The results of the system were in general promising and can be applied not only to enrich and expand laymen vocabularies for medicine but any ontology for a domain, given an appropriate corpus for the domain. Our approach is applicable to narrow domains that may not have the huge training corpora typically used with word embedding approaches. In essence, by incorporating an external source of linguistic information, WordNet, and expanding the training corpus, we are getting more out of our training corpus. Our system can help building an application for patients where they can read their physician\u27s letters more understandably and clearly. Moreover, the output of this system can be used to improve the results of healthcare search engines, entity recognition systems, and many others

    Umls-based analysis of medical terminology coverage for tags in diabetes-related blogs

    Get PDF
    There is a well-known terminology disparity between laypeople and health professionals. Using the Unified Medical Language System (UMLS), this study explores an exploratory study on the terminology usages of laypeople, focusing on diabetes. We explain the analysis pipeline of extracting laypeople’s medical terms and matching them to the existing medical controlled vocabulary system. The preliminary result shows the promise of using the UMLS and Tumblr data for such analysis

    Doctor of Philosophy

    Get PDF
    dissertationThe use of the various complementary and alternative medicine (CAM) modalities for the management of chronic illnesses is widespread, and still on the rise. Unfortunately, tools to support consumers in seeking information on the efficacy of these treatments are sparse and incomplete. The goals of this work were to understand CAM information needs in acquiring CAM information, assess currently available information resources, and investigate informatics methods to provide a foundation for the development of CAM information resources. This dissertation consists of four studies. The first was a quantitative study that aimed to assess the feasibility of delivering CAM-drug interaction information through a web-based application. This study resulted in an 85% participation rate and 33% of those patients reported the use of CAMs that had potential interactions with their conventional treatments. The next study aimed to assess online CAM information resources that provide information on drug-herb interactions to consumers. None of the sites scored high on the combination of completeness and accuracy and all sites were beyond the recommended reading level per the US Department of Health and Human Services. The third study investigated information-seeking behaviors for CAM information using an existing cohort of cancer survivors. The study showed that patients in the cohort continued to use CAM well into survivorship. Patients felt very much on their own in dealing with issues outside of direct treatment, which often resulted in a search for options and CAM use. Finally, a study was conducted to investigate two methods to semi-automatically extract CAM treatment relations from the biomedical literature. The methods rely on a database (SemMedDB) of semantic relations extracted from PubMed abstracts. This study demonstrated that SemMedDB can be used to reduce manual efforts, but review of the extracted sentences is still necessary due to a low mean precision of 23.7% and 26.4%. In summary, this dissertation provided greater insight into consumer information needs for CAM. Our findings provide an opportunity to leverage existing resources to improve the information-seeking experience for consumers through high-quality online tools, potentially moving them beyond the reliance on anecdotal evidence in the decision-making process for CAM

    The interlinking theorization of management concepts: Cohesion and semantic equivalence in management knowledge

    Get PDF
    This article develops the idea of 'interlinking theorization' in the context of management knowledge. We explain how management concepts are theorized through their direct co-occurrence with other management concepts, on the one hand, and their embeddedness in general business vocabulary, on the other. Conceptually, we extend a semantic network approach to vocabularies and suggest both cohesion between management concepts (i.e. a clustering in bundles) and their semantic equivalence (i.e. similar patterns of connectivity to general business vocabulary indicating specific types) as core dimensions of interlinking theorization. Empirically, we illustrate and further develop our conceptual model with data collected from magazines targeting management practitioners in the Austrian public sector. Our article contributes to existing literature by extending theorization to include different kinds of relationships between management concepts and focusing on direct and indirect relations across populations of management concepts as characteristics of the overall 'architecture' of management knowledge

    A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

    Get PDF
    abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Vaccine semantics : Automatic methods for recognizing, representing, and reasoning about vaccine-related information

    Get PDF
    Post-marketing management and decision-making about vaccines builds on the early detection of safety concerns and changes in public sentiment, the accurate access to established evidence, and the ability to promptly quantify effects and verify hypotheses about the vaccine benefits and risks. A variety of resources provide relevant information but they use different representations, which makes rapid evidence generation and extraction challenging. This thesis presents automatic methods for interpreting heterogeneously represented vaccine information. Part I evaluates social media messages for monitoring vaccine adverse events and public sentiment in social media messages, using automatic methods for information recognition. Parts II and III develop and evaluate automatic methods and res

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201
    corecore