11 research outputs found

    A new clustering method for detecting rare senses of abbreviations in clinical notes

    Get PDF
    AbstractAbbreviations are widely used in clinical documents and they are often ambiguous. Building a list of possible senses (also called sense inventory) for each ambiguous abbreviation is the first step to automatically identify correct meanings of abbreviations in given contexts. Clustering based methods have been used to detect senses of abbreviations from a clinical corpus [1]. However, rare senses remain challenging and existing algorithms are not good enough to detect them. In this study, we developed a new two-phase clustering algorithm called Tight Clustering for Rare Senses (TCRS) and applied it to sense generation of abbreviations in clinical text. Using manually annotated sense inventories from a set of 13 ambiguous clinical abbreviations, we evaluated and compared TCRS with the existing Expectation Maximization (EM) clustering algorithm for sense generation, at two different levels of annotation cost (10 vs. 20 instances for each abbreviation). Our results showed that the TCRS-based method could detect 85% senses on average; while the EM-based method found only 75% senses, when similar annotation effort (about 20 instances) was used. Further analysis demonstrated that the improvement by the TCRS method was mainly from additionally detected rare senses, thus indicating its usefulness for building more complete sense inventories of clinical abbreviations

    A critical review of PASBio's argument structures for biomedical verbs

    Get PDF
    BACKGROUND: Propositional representations of biomedical knowledge are a critical component of most aspects of semantic mining in biomedicine. However, the proper set of propositions has yet to be determined. Recently, the PASBio project proposed a set of propositions and argument structures for biomedical verbs. This initial set of representations presents an opportunity for evaluating the suitability of predicate-argument structures as a scheme for representing verbal semantics in the biomedical domain. Here, we quantitatively evaluate several dimensions of the initial PASBio propositional structure repository. RESULTS: We propose a number of metrics and heuristics related to arity, role labelling, argument realization, and corpus coverage for evaluating large-scale predicate-argument structure proposals. We evaluate the metrics and heuristics by applying them to PASBio 1.0. CONCLUSION: PASBio demonstrates the suitability of predicate-argument structures for representing aspects of the semantics of biomedical verbs. Metrics related to theta-criterion violations and to the distribution of arguments are able to detect flaws in semantic representations, given a set of predicate-argument structures and a relatively small corpus annotated with them

    The entropy of words-learnability and expressivity across more than 1000 languages

    Get PDF
    The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with information-theoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.Peer ReviewedPostprint (published version

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

    Handoffs in Hospitals: A review of the literature on information exchange while transferring patient responsibility or control

    Full text link
    This document reviews the full collection of literature on hospital handoffs and is referenced by shorter publications. Researchers may see abstracts at http://www.connotea.org/user/signout . Access to the full text of the articles may be requested by contacting the authors.Background: In hospitals, handoffs are episodes in which control of, or responsibility for, a patient passes from one health professional to another, and in which important information about the patient is also exchanged. In view of the growing interest in improving handoff processes, and the need for guidance in arriving at standardized handoff procedures, a review of the research on handoffs is provided. Methods: The authors have attempted to identify all research treatments of hospital handoffs involving medical personnel published in English through July 2008. Results: Findings from the literature are organized into six themes: 1) The definition of 'handoff'; 2) The functions of handoffs; 3) The challenges and difficulties of handing off; 4) The costs and benefits of standardization; 5) Possible protocols for standardizing of handoffs; and 6) Questions needing answers, and methods of research. Conclusions: The large body of relevant literature shows handoff to be highly sensitive to variations in context, to be an activity that is essential for multiple important functions within a hospital that range far beyond patient safety, and to be subject to difficult tensions that necessarily attend efforts to standardize action within a highly differentiated hospital setting. In addition, there is little empirical evidence regarding the magnitude of the impact of handoff on patient safety and service quality, making the potential gains and complications from standardization uncertain.Robert Wood Johnson Foundationhttp://deepblue.lib.umich.edu/bitstream/2027.42/61498/1/Handoffs_in_Hospitals_Literature_Review_081014.pd

    Towards semantic interpretation of clinical narratives with ontology-based text mining

    Get PDF
    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this thesis, we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexicosemantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00%, recall of 97.63% and F-measure of 97.81%, the values of which are in line with human-like performance. To demonstrate the utility of formally structuring clinical narratives and possible applications in epidemiology, we describe an implementation of KneeBase, a web-based information retrieval system that supports complex searches over the results obtained via KneeTex. It is the structured nature of extracted information that allows queries that encode not only search terms, but also relationships between them (e.g. between clinical findings and anatomical locations). This is of particular value for large-scale epidemiology studies based on qualitative evidence, whose main bottleneck involves manual inspection of many text documents. The two systems presented in this dissertation, KneeTex and KneeBase, operate in a specific domain, but illustrate generic principles for rapid development of clinical text mining systems. The key enabler of such systems is the existence of an appropriate ontology. To tackle this issue, we proposed a strategy for ontology expansion, which proved effective in fast–tracking the development of our information extraction and retrieval systems

    Patient Handoffs between Emergency Department and Inpatient Physicians: A Qualitative Study to Inform Standardization of Practice and Organization Theory

    Full text link
    This dissertation is motivated by two problems. First, existing literature characterizes patient handoff as an information transfer activity in which safety and quality are compromised by practice variation. This has prompted a movement to standardize practice. However, existing research has not closely examined how practice variations may be responses to situational and organizational factors or evidence of involved parties accomplishing important functions beyond information transfer. Consequently, standardization efforts run at least two risks: overlooking opportunities for improvement, and engendering negative unintended consequences. Second, despite the fact that roughly 50% of all hospitalized patients are handed off from emergency departments to inpatient units, such handoffs are significantly understudied. I conducted a two-year ethnographic study of handoffs occurring between Emergency Department and General Medicine physicians when patients were admitted to one highly-specialized tertiary referral, teaching hospital. Using theoretical sampling informed by a Grounded Theory methodology, I conducted observations (n=349 hours) and semi-structured interviews (n=48) and recorded handoff conversations (n=48). I analyzed data by means of immersion, various qualitative coding approaches, and memo writing. Findings are organized in three chapters. First, I challenge the dominant model of handoff as information transfer by demonstrating that physicians actively construct understandings of their patients, over time, as they encounter, interpret, assemble, and reassemble information through socially-interactive processes within particular contexts and situations. Consequently, multiple understandings of a single patient are not only possible but likely. Second, I characterize admission handoffs as negotiations, situated by entangled webs of motives and concerns which produce ambiguities. Involved parties must navigate these ambiguities as they develop their differing understandings of patients, resolve conflicts over approaches to care, and agree regarding additional work. Third, I show that boundaries between units are ongoing, effortful accomplishments, re-enacted through interactive negotiations. Over time these negotiations have the potential to shift boundaries and alter the divisions of labor in the hospital, with potential consequences for organizational outcomes. Recommendations for practical improvements and further research are presented.Ph.D.InformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86293/1/bhilligo_1.pd
    corecore