9 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationBiomedical data are a rich source of information and knowledge. Not only are they useful for direct patient care, but they may also offer answers to important population-based questions. Creating an environment where advanced analytics can be performed against biomedical data is nontrivial, however. Biomedical data are currently scattered across multiple systems with heterogeneous data, and integrating these data is a bigger task than humans can realistically do by hand; therefore, automatic biomedical data integration is highly desirable but has never been fully achieved. This dissertation introduces new algorithms that were devised to support automatic and semiautomatic integration of heterogeneous biomedical data. The new algorithms incorporate both data mining and biomedical informatics techniques to create "concept bags" that are used to compute similarity between data elements in the same way that "word bags" are compared in data mining. Concept bags are composed of controlled medical vocabulary concept codes that are extracted from text using named-entity recognition software. To test the new algorithm, three biomedical text similarity use cases were examined: automatically aligning data elements between heterogeneous data sets, determining degrees of similarity between medical terms using a published benchmark, and determining similarity between ICU discharge summaries. The method is highly configurable and 5 different versions were tested. The concept bag method performed particularly well aligning data elements and outperformed the compared algorithms by iv more than 5%. Another configuration that included hierarchical semantics performed particularly well at matching medical terms, meeting or exceeding 30 of 31 other published results using the same benchmark. Results for the third scenario of computing ICU discharge summary similarity were less successful. Correlations between multiple methods were low, including between terminologists. The concept bag algorithms performed consistently and comparatively well and appear to be viable options for multiple scenarios. New applications of the method and ideas for improving the algorithm are being discussed for future work, including several performance enhancements, configuration-based enhancements, and concept vector weighting using the TF-IDF formulas

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Exploring genomic medicine using integrative biology

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2004.Includes bibliographical references (p. 215-227).Instead of focusing on the cell, or the genotype, or on any single measurement modality, using integrative biology allows us to think holistically and horizontally. A disease like diabetes can lead to myocardial infarction, nephropathy, and neuropathy; to study diabetes in genomic medicine would require reasoning from a disease to all its various complications to the genome and back. I am studying the process of intersecting nearly-comprehensive data sets in molecular biology, across three representative modalities (microarrays, RNAi and quantitative trait loci) out of the more than 30 available today. This is difficult because the semantics and context of each experiment performed becomes more important, necessitating a detailed knowledge about the biological domain. I addressed this problem by using all public microarray data from NIH, unifying 50 million expression measurements with standard gene identifiers and representing the experimental context of each using the Unified Medical Language System, a vocabulary of over 1 million concepts. I created an automated system to join data sets related by experimental context.(cont.) I evaluated this system by finding genes significantly involved in multiple experiments directly and indirectly related to diabetes and adipogenesis and found genes known to be involved in these diseases and processes. As a model first step into integrative biology, I then took known quantitative trait loci in the rat involved in glucose metabolism and build an expert system to explain possible biological mechanisms for these genetic data using the modeled genomic data. The system I have created can link diseases from the ICD-9 billing code level down to the genetic, genomic, and molecular level. In a sense, this is the first automated system built to study the new field of genomic medicine.by Atul Janardhan Butte.Ph.D
    corecore