92 research outputs found

    Concept graphs: Applications to biomedical text categorization and concept extraction

    Get PDF
    As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

    An Automated Method to Enrich and Expand Consumer Health Vocabularies Using GloVe Word Embeddings

    Get PDF
    Clear language makes communication easier between any two parties. However, a layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon, which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this dissertation, we present an automatic method to enrich existing concepts in a medical ontology with additional laymen terms and also to expand the number of concepts in the ontology that do not have associated laymen terms. Our work has the benefit of being applicable to vocabularies in any domain. Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. We improve these vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. By performing iterative feedback using GloVe’s candidate terms, we can boost the number of word occurrences in the co-occurrence matrix allowing our approach to work with a smaller training corpus. Our novel algorithms and GloVe were evaluated using two laymen datasets from the National Library of Medicine (NLM), the Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV) and the MedlinePlus Healthcare Vocabulary. For our first goal, enriching concepts, the results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Our best algorithm enhanced the corpus with synonyms from WordNet, outperformed GloVe with an F-score relative improvement of 25%. For our second goal, expanding the number of concepts with related laymen’s terms, our synonym-enhanced GloVe outperformed GloVe with a relative F-score relative improvement of 63%. The results of the system were in general promising and can be applied not only to enrich and expand laymen vocabularies for medicine but any ontology for a domain, given an appropriate corpus for the domain. Our approach is applicable to narrow domains that may not have the huge training corpora typically used with word embedding approaches. In essence, by incorporating an external source of linguistic information, WordNet, and expanding the training corpus, we are getting more out of our training corpus. Our system can help building an application for patients where they can read their physician\u27s letters more understandably and clearly. Moreover, the output of this system can be used to improve the results of healthcare search engines, entity recognition systems, and many others

    Ontology and text mining: methods and applications for hypertrophic cardiomyopathy and beyond

    Get PDF
    In this thesis we describe a number of contributions across the deeply interlinked domains of ontology, text mining, and prognostic modelling. We explore and evaluate ontology interoperability, and develop new methods for synonym expansion and negation detection in biomedical text. In addition to evaluating these pieces of work individually, we use them to form the basis of a text mining pipeline that can identify and phenotype patients across a clinical text record, which is used to reveal hundreds of University Hospitals Birmingham patients diagnosed with hypertrophic cardiomyopathy who are unknown to the specialist clinic. The work culminates in the text mining results being used to enable prognostic modelling of complication development in patients with hypertrophic cardiomyopathy, finding that routine blood markers, in addition to already well known variables, are powerful predictors

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201

    Translational drug interaction study using text mining technology

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Drug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and has been demonstrated to threat public health. It causes an estimated 195,000 hospitalizations and 74,000 emergency room visits each year in the USA alone. Current DDI research aims to investigate different scopes of drug interactions: molecular level of pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical pharmacodynamics consequences (PD). All three types of experiments are important, but they are playing different roles for DDI research. As diverse disciplines and varied studies are involved, interaction evidence is often not available cross all three types of evidence, which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics research. In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental evidence for them. This is a collective intelligence effort, whereby a text mining tool will be developed for the large-scale mining and analysis of drug-interaction information such that it can be applied to retrieve, categorize, and extract the information of DDI from published literature available on PubMed. To this end, three tasks will be done in this research work: First, the needed lexica, ontology, and corpora for distinguishing three different types of studies were prepared. Despite the lexica prepared in this work, a comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will be proposed to identify drug metabolites and reaction in free text. Third, text mining tools for retrieving DDI articles and extracting DDI evidence are developed. In this work, the knowledge gaps cross all three types of DDI evidence can be identified and the gaps between knowledge of molecular mechanisms underlying DDI and their clinical consequences can be closed with the result of DDI prediction using the retrieved drug gene interaction information such that we can exemplify how the tools and methods can advance DDI pharmacogenetics research.2 year

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

    Biological knowledge management and gene network analysis: a heuristic road to System Biology

    Get PDF
    In order to understand the molecular basis of living cells and organisms, biologists over the past decades have been studying life's core molecular players: the genes. Most genes have a specific function, a role they play in the collective task of developing a cell and supporting all the aspects of keeping it alive. These genes do not perform their function randomly. Instead, after billions of years of evolution, nature's trial-and-error process, they have become parts of an utterly complex and intricate network, an interconnected mesh of genes that comprises signal detection cascades, enzymatic reactions, control mechanisms, etc. Over several past decades, experimental molecular biologists have sought mainly to study these genes via a one-by-one approach. However, with the advent of high-throughput experimental techniques, the number-crunching power of computers, and the realisation that many biological functions are the result of interactions between genes or their proteins, Biology's related field of Systems Biology has emerged. Here, one tries to combine the dispersed information produced by many researchers, in integrated assemblies called gene networks. Our research comprises the development of two new methods for improved information integration in the field of molecular Systems Biology. The first one aims to support an approach to acquire insights in the dynamics of gene networks (the behaviour of gene activities over time), called 'modelling and simulation' of genetic regulatory networks. Our second new method approaches the problem of how to collect and manage the information necessary to compose such genetic networks in the first place, based on scattered information in a dispersed and increasingly fast growing body of publications. These two methods form two separate parts in this thesis (chapters 2-4, and chapters 5-7). Chapter 1, section 1.3 provides an introductory, complete overview of this thesis. It is intended as a light introduction to my doctoral research, presented in an informal and entertaining way, and mainly addressed to my friends and family. It forms an introduction for the laymen to our work and the concepts that are important for this thesis. Chapters 2, 3 and 4 constitute Part 1 of this thesis. Chapter 2 gives a review of the various formalisms for modelling and simulation of gene networks, as a thorough background for our work presented in the following chapter. Chapter 3 describes SIM-plex, our new software tool that forms a bridge between a mathematical gene network modelling formalism, and the biologist, who usually is more an expert in the biology behind the gene network than a mathematician can ever be. It shields off the mathematics in a new way so as to enable biologists to experiment with modelling and simulation themselves. Chapter 4 describes the various applications that SIM-plex was used for. The research described in Part 2 of this thesis, chapters 5, 6 and 7, emerged from our own need for a better management of biological information. We experienced this necessity while we were building a larger genetic network for the Arabidopsis cell cycle, and it forms a general problem in biology. Chapter 5 gives a background of the currently existing methods for harvesting literature information, but comes to the conclusion that no existing automated or manual method displays sufficient potential to capture the largest part of information from literature in a structured way. In chapter 6, we describe our bold proposal of a new method to tackle this problem: MineMap, a community-based manual text-curation initiative. We describe the various aspects required to make such a project possible, based on our own experiences with our prototype application MineMap. This research is organised in a 'heuristic' way, in the sense that we built a first sketch and a working solution that also generated experiences for improvements in a next design. While chapter 6 describes our new ideas and concrete implementations in considerable detail, chapter 7 then illustrates the core concept behind MineMap

    Usability analysis of contending electronic health record systems

    Get PDF
    In this paper, we report measured usability of two leading EHR systems during procurement. A total of 18 users participated in paired-usability testing of three scenarios: ordering and managing medications by an outpatient physician, medicine administration by an inpatient nurse and scheduling of appointments by nursing staff. Data for audio, screen capture, satisfaction rating, task success and errors made was collected during testing. We found a clear difference between the systems for percentage of successfully completed tasks, two different satisfaction measures and perceived learnability when looking at the results over all scenarios. We conclude that usability should be evaluated during procurement and the difference in usability between systems could be revealed even with fewer measures than were used in our study. © 2019 American Psychological Association Inc. All rights reserved.Peer reviewe

    Molecular pathogenesis of Helicobacter hepaticus induced liver disease

    Get PDF
    Thesis (Ph. D. in Molecular and Systems Bacterial Pathogenesis)--Massachusetts Institute of Technology, Biological Engineering Division, 2005.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references.Helicobacter hepaticus infection of A/JCr mice is a model of liver cancer resulting from chronic active inflammation. We monitored hepatic global gene expression profiles and correlated them to histological liver lesions in H. hepaticus infected and control male A/JCr mice at 3 months, 6 months, and 1 year of age. We used an Affymetrix-based oligonucleotide microarray platform on the premise that a specific genetic expression signature at isolated time points would be indicative of disease status. Model based expression index comparisons generated by dChip yielded consistent profiles of differential gene expression for H. hepaticus infected male mice with progressive liver disease versus uninfected control mice within each age group. Linear discriminant analysis and principal component analysis allowed segregation of mice based on combined age and lesion status, or age alone. Up-regulated genes present throughout the 12 month study involved inflammation, tissue repair, and host immune function. Upregulation of putative tumor and proliferation markers correlated with advancing hepatocellular dysplasia. Transcriptionally down-regulated genes in mice with liver lesions included those related to peroxisome proliferator, cholesterol, and steroid metabolism pathways. Transcriptional profiling of hepatic genes documented gene expression signatures in the livers of H. hepaticus infected male A/JCr mice with chronic progressive hepatitis and preneoplastic liver lesions, complemented the histopathological diagnosis, and suggested molecular targets for the monitoring and intervention of disease progression prior to the onset of hepatocellular neoplasia. Our laboratory, in collaboration with Professors Suerbaum and Schauer, recently identified a(cont.) 70kb genomic island in Helicobacter hepaticus strain ATCC 51488 as a putative pathogenicity island (HhPAI) (Suerbaum et al, PNAS, 2003). This region within H. hepaticus contains genes HH0233-HH0302, a differential GC content, several long tandem repeats but no flanking repeats, and three components of a type IV secretion system (T4SS). A/JCr mice were experimentally infected with three naturally occurring strains of H. hepaticus including the type strain H. hepaticus ATCC 51488 strain (Hh 3B1) isolated from A/JCr mice, MIT 96-1809 (Hh NET) isolated from mice shipped from the Netherlands, and MIT-96-284 (HhG) isolated from mice acquired from Germany.4 HhNET (missing most of the HhPAI) infected male A/JCR mice exhibited a significantly lower prevalence (p<.05) of hepatic lesions at 6 months post infection than Hh 3B1 with an intact HhPAI. Hh G also has a large segment of the genomic island deleted, but not as many genes are deleted as compared to Hh NET. Hh G also demonstrated a lower prevalence of hepatic lesions. This variable pathological effect was evident in male mice only. The severity of chronic active inflammation in the liver of the H. hepaticus infected A/JCr mice depended on H. hepaticus liver colonization levels. The in vivo results support the presence of the HhPAI as a legitimate virulence determinant and predictor of severity of liver lesions in H. hepaticus infected A/JCr male mice. To further determine the differences in virulence of the H. hepaticus strains Hh 3B1, Hh NET, Hh G and an isogenic mutant H. ...by Samuel R. Boutin.Ph.D.in Molecular and Systems Bacterial Pathogenesi
    • …
    corecore