13 research outputs found

    Nem felügyelt módszerek alkalmazása releváns kifejezések azonosítására és csoportosítására klinikai dokumentumokban

    Get PDF
    A kórházi körülmények között létrejövő klinikai dokumentu- mok feldolgozása a nyelvtechnológia egyik központi kutatás i területévé vált az utóbbi időben. A más jellegű, általános nyelvezetű sz övegek feldolgozására használt kész eszközök azonban nem alkalmazhatóak, illetve gyengén teljesítenek a speciális orvosi szövegek esetén. To vábbá számos olyan feladat van, amelyek során a szakkifejezések azonosítás a és a közöt tük lévő kapcsolatok meghatározása nagyon fontos lépés, azo nban csak külső lexikai erőforrások, tezauruszok és ontológiák segít ségével oldhatók meg. Az olyan kisebb nyelvek esetén, mint a magyar, ilyen tudásbázisok nem állnak rendelkezésre. Ezért a szövegekben lévő informác iók annotálása és rendszerezése emberi szakértői munkát igényel. Ebb en a cikkben bemutatjuk,hogy statisztikai módszerekkel milyen módon al akíthatók át a nyers dokumentumok egy olyan előfeldolgozott,részben str ukturált for mára,ami ezt az emberi munkát könnyebbé teszi. A csupán a korpusz fel használásával alkalmazott modulok felismerik és feloldják a r övidítéseket, azonosítják a többszavas kifejezéseket és meghatározzák azok hasonlóságát. Végül létrehoztuk a szövegek egy magasabb szintű repre zentációját, ahol az egyes kifejezések helyére a hasonlóságuk alapján kialakított klasz terek azonosítóját helyettesítve a szövegek egyszerűsíthe tőek, a gyakran ismétlődő mintázatok általános alakja meghatározható

    Volume Table of Contents

    Get PDF

    Understanding Patient Safety Reports via Multi-label Text Classification and Semantic Representation

    Get PDF
    Medical errors are the results of problems in health care delivery. One of the key steps to eliminate errors and improve patient safety is through patient safety event reporting. A patient safety report may record a number of critical factors that are involved in the health care when incidents, near misses, and unsafe conditions occur. Therefore, clinicians and risk management can generate actionable knowledge by harnessing useful information from reports. To date, efforts have been made to establish a nationwide reporting and error analysis mechanism. The increasing volume of reports has been driving improvement in quantity measures of patient safety. For example, statistical distributions of errors across types of error and health care settings have been well documented. Nevertheless, a shift to quality measure is highly demanded. In a health care system, errors are likely to occur if one or more components (e.g., procedures, equipment, etc.) that are intrinsically associated go wrong. However, our understanding of what and how these components are connected is limited for at least two reasons. Firstly, the patient safety reports present difficulties in aggregate analysis since they are large in volume and complicated in semantic representation. Secondly, an efficient and clinically valuable mechanism to identify and categorize these components is absent. I strive to make my contribution by investigating the multi-labeled nature of patient safety reports. To facilitate clinical implementation, I propose that machine learning and semantic information of reports, e.g., semantic similarity between terms, can be used to jointly perform automated multi-label classification. My work is divided into three specific aims. In the first aim, I developed a patient safety ontology to enhance semantic representation of patient safety reports. The ontology supports a number of applications including automated text classification. In the second aim, I evaluated multilabel text classification algorithms on patient safety reports. The results demonstrated a list of productive algorithms with balanced predictive power and efficiency. In the third aim, to improve the performance of text classification, I developed a framework for incorporating semantic similarity and kernel-based multi-label text classification. Semantic similarity values produced by different semantic representation models are evaluated in the classification tasks. Both ontology-based and distributional semantic similarity exerted positive influence on classification performance but the latter one shown significant efficiency in terms of the measure of semantic similarity. Our work provides insights into the nature of patient safety reports, that is a report can be labeled by multiple components (e.g., different procedures, settings, error types, and contributing factors) it contains. Multi-labeled reports hold promise to disclose system vulnerabilities since they provide the insight of the intrinsically correlated components of health care systems. I demonstrated the effectiveness and efficiency of the use of automated multi-label text classification embedded with semantic similarity information on patient safety reports. The proposed solution holds potential to incorporate with existing reporting systems, significantly reducing the workload of aggregate report analysis

    Doctor of Philosophy

    Get PDF
    dissertationDomain adaptation of natural language processing systems is challenging because it requires human expertise. While manual e ort is e ective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and con dentiality restrictions that hinder the ability to share training corpora among di erent research groups. Semantic ambiguity is a major barrier for e ective and accurate concept recognition by natural language processing systems. In my research I propose an automated domain adaptation method that utilizes sublanguage semantic schema for all-word word sense disambiguation of clinical narrative. According to the sublanguage theory developed by Zellig Harris, domain-speci c language is characterized by a relatively small set of semantic classes that combine into a small number of sentence types. Previous research relied on manual analysis to create language models that could be used for more e ective natural language processing. Building on previous semantic type disambiguation research, I propose a method of resolving semantic ambiguity utilizing automatically acquired semantic type disambiguation rules applied on clinical text ambiguously mapped to a standard set of concepts. This research aims to provide an automatic method to acquire Sublanguage Semantic Schema (S3) and apply this model to disambiguate terms that map to more than one concept with di erent semantic types. The research is conducted using unmodi ed MetaMap version 2009, a concept recognition system provided by the National Library of Medicine, applied on a large set of clinical text. The project includes creating and comparing models, which are based on unambiguous concept mappings found in seventeen clinical note types. The e ectiveness of the nal application was validated through a manual review of a subset of processed clinical notes using recall, precision and F-score metrics

    The Prevalence and Effects of Scientific Agreement and Disagreement in Media

    Full text link
    Disagreement is inherent to the production of scientific knowledge, but its communication can erode the credibility of science in the eyes of the public. This tension pervades all science communication; however, under conditions of uncertainty it is most vital to act on knowledge about which experts are certain. To better understand how the public responds scientific agreement and disagreement, this dissertation investigates three questions that spring from previous work on the subject. It explores how much scientific disagreement the public is exposed to, how disagreement affects trust in science, and whether motivated perceptions of disagreement can be corrected by scientific agreement messages. The first study uses multiple computer-assisted content analytic methods to reveal that, in the last thirty years of climate change newspaper coverage, the prevalence of scientific agreement and disagreement have declined but denial messages have increased. The second study examines the effects of civil and uncivil scientific disagreement on a range of science attitudes in an online experiment. Compared to agreement messages, I find that disagreement and incivility not only negatively affect attention to and evaluation of scientific topics, but also trust in science and perceptions about the value of science. The final experiment reveals that agreement messages are insufficient to persuade those motivated by political identities of scientifically supported positions on climate change. It also highlights that debate about the efficacy of consensus messages in extant research comes in part from the choice by some researchers to pretest climate attitude measures. In sum, people are frequently exposed to messages about scientific disagreement in news, these messages negatively affect both issue attitudes and broader views about science, and agreement messages are not sufficient to reduce motivated perceptions of scientific disagreement on politicized issues. Understanding the ways in which the public responds to scientific disagreements is important because scientists have an ethical obligation to be honest about uncertainties. Additionally, increasingly competitive political and media systems are likely to amplify scientific disagreements in the public eye. Though trust in science remains high among the US public, this work shows that disagreement messages, amplified by politicization, can have consequences beyond a single issue contexts, with implications for public perceptions about the value of scientific knowledge in social and political life.PHDCommunicationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163254/1/sbchinn_1.pd

    Contradictions Between How Students Are Taught to Write And What They Are Expected To Read In General Education Courses

    Get PDF
    This study explored the relationship between how students are taught to write in first-year English composition classes and what they are expected to read as part of the general education requirements at a publically-funded large university in the southeast (PLUS), and then to determine whether a gap exists. If a gap is found to exist between the preparation of students and their ability to read material that has been assigned by the teaching faculty, these students are less likely to be considered information literate by any rubric. This study uses a mixed-methods approach. Content analysis is employed to examine the assigned readings students encounter, and interviews are conducted to explore how students make sense of the academic writings assigned in general education classes. Research questions included (1) What are the overall structures of both (a) instruction composition and (b) scholarly journal articles assigned for reading in subsequent general education classes in the disciplines of psychology and history at PLUS? (2) How can these structures be identified? (3) What are the top-level structural patterns of composition within these two academic disciplines and how do they differ? and (4) Do these differences create contradictions in how students are taught to write in freshmen composition courses and the composition of the journal articles they are expected to read in their required general education classes? Thirty-one texts taken from general education syllabi were analyzed for incidence and placement of specific structural elements such as topic sentences and signal words. This study also explored perceptions of these differences from the standpoint of college students. Interviews of twenty-two students were conducted using Dervin’s Sense Making Methodology. These interviews were analyzed in terms of situations, gaps, bridges, outcomes, as well as thematic concepts that consistently arose during the interviews. Significant differences existed between readings from English Composition classes and assigned scholarly journal articles in history and psychology in incidence and placement of topic sentences, use of signal words or phrases, and readability. In addition, thematic analysis of the interviews of students found that they experienced gaps between their expectations of text composition and their experience reading assigned journal articles

    XI. Magyar Számítógépes Nyelvészeti Konferencia

    Get PDF

    Nem felügyelt módszerek alkalmazása releváns kifejezések azonosítására és csoportosítására klinikai dokumentumokban

    Get PDF
    A kórházi körülmények között létrejövő klinikai dokumentumok feldolgozása a nyelvtechnológia egyik központi kutatási területévé vált az utóbbi időben. A más jellegű, általános nyelvezetű szövegek feldolgozására használt kész eszközök azonban nem alkalmazhatóak, illetve gyengén teljesítenek a speciális orvosi szövegek esetén. Továbbá számos olyan feladat van, amelyek során a szakkifejezések azonosítása és a közöttük lévő kapcsolatok meghatározása nagyon fontos lépés, azonban csak külső lexikai erőforrások, tezauruszok és ontológiák segítségével oldhatók meg. Az olyan kisebb nyelvek esetén, mint a magyar, ilyen tudásbázisok nem állnak rendelkezésre. Ezért a szövegekben lévő információk annotálása és rendszerezése emberi szakértői munkát igényel. Ebben a cikkben bemutatjuk, hogy statisztikai módszerekkel milyen módon alakíthatók át a nyers dokumentumok egy olyan előfeldolgozott, részben strukturált formára, ami ezt az emberi munkát könnyebbé teszi. A csupán a korpusz felhasználásával alkalmazott modulok felismerik és feloldják a rövidítéseket, azonosítják a többszavas kifejezéseket és meghatározzák azok hasonlóságát. Végül létrehoztuk a szövegek egy magasabb szintű reprezentációját, ahol az egyes kifejezések helyére a hasonlóságuk alapján kialakított klaszterek azonosítóját helyettesítve a szövegek egyszerűsíthetőek, a gyakran ismétlődő mintázatok általános alakja meghatározható
    corecore