7 research outputs found

    Performance and error analysis of three part of speech taggers on health texts

    Get PDF
    Increasingly, natural language processing (NLP) techniques are being developed and utilized in a variety of biomedical domains. Part of speech tagging is a critical step in many NLP applications. Currently, we are developing a NLP tool for text simplification. As part of this effort, we set off to evaluate several part of speech (POS) taggers. We selected 120 sentences (2375 tokens) from a corpus of six types of diabetes-related health texts and asked human reviewers to tag each word in these sentences to create a "Gold Standard." We then tested each of the three POS taggers against the "Gold Standard." One tagger (dTagger) had been trained on health texts and the other two (MaxEnt and Curran & Clark) were trained on general news articles. We analyzed the errors and placed them into five categories: systematic, close, subtle, difficult source, and other. The three taggers have relatively similar rates of success: dTagger, MaxEnt, and Curran & Clark had 87%, 89% and 90% agreement with the gold standard, respectively. These rates of success are lower than published rates for these taggers. This is probably due to our testing them on a corpus that differs significantly from their training corpora. The taggers made different errors: the dTagger, which had been trained on a set of medical texts (MedPost), made fewer errors on medical terms than MaxEnt and Curran & Clark. The latter two taggers performed better on non-medical terms and we found the difference between their performance and that of dTagger was statistically significant. Our findings suggest that the three POS taggers have similar correct tagging rates, though they differ in the types of errors they make. For the task of text simplification, we are inclined to perform additional training of the Curran & Clark tagger with the Medpost corpus because both the fine grained tagging provided by this tool and the correct recognition of medical terms are equally important

    Comprehension Profile of Patient Education Materials in Endocrine Care

    Get PDF
    Introduction. The internet is an ever-evolving resource to improve healthcare literacy among patients. The nature of the internet can make it difficult to condense educational materials in a manner applicable to a worldwide patient audience. Within the realm of endocrinology, there is a lack of a comprehensive analysis regarding these pathologies in addition to educational materials related to their medical workup and management. The aim of this study is to assess contemporary online patient education material in endocrinology and management of care. Methods. Analysis of the readability of 1500 unique online education materials was performed utilizing 7 readability measures: Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index Readability Formula (FOG), Simple Measure of Gobbledygook Index (SMOG), Coleman-Liau Index (CLI), automated readability index (ARI), and Linsear Write Formula (LWF). Results. The average grade level readability scores from 6 measures (FKGL, FOG, SMOG, CLI, ARI, LWF) was ≄11 which corresponds to a reading level at or above the 11th grade. The average FRE between adrenal, diabetes and thyroid-related education material ranged between “fairly difficult” to “very difficult”. Conclusions. The current readability of contemporary online endocrine education material does not meet current readability recommendations for appropriate comprehension of the general audience

    Health Literacy Analytics of Accessible Patient Resources in Cardiovascular Medicine: What Are Patients Wanting to Know?

    Get PDF
    Introduction. There remains an increasing utilization of internet-based resources as a first line for medical knowledge. Among patients with cardiovascular disease, these resources are often relied upon for numerous diagnostic and therapeutic modalities. However, the reliability of this information is not fully understood. The aim of this study is to provide a descriptive profile on the literacy quality, readability, and transparency of publicly available educational resources in cardiology. Methods. The frequently asked questions and associated online educational articles on common cardiovascular diagnostic and therapeutic interventions were investigated using publicly available data from the Google RankBrain machine learning algorithm after applying inclusion and exclusion criteria. Independent raters evaluated questions for Rothwell’s Classification and readability calculations. Results. Collectively, 520 questions and articles were evaluated across 13 cardiac interventions, resulting in 3120 readability scores. The source of articles was most frequently from academic institutions followed by commercial sources. A vast majority of questions were classified as “Fact” at 57.0% (n= 395), and questions regarding “technical details” of each intervention were most common subclassification at 56.3% (n= 293). Conclusions. The investigation demonstrates through its findings that patients are most often using online search query programs to seek information regarding specific knowledge of each cardiovascular intervention rather than form evaluation of the intervention. Additionally, these online patient educational resources continue to not meet grade-level reading recommendations

    The multidimensional kidney transplant self-management scale : development and psychometric testing

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Poor long-term kidney transplant outcomes are a significant problem in the U.S. Interventions must focus on preserving allograft function by managing modifiable risk factors. An instrument capable of identifying problems with post-kidney transplant self-management behaviors may enable the design and testing of self-management interventions. This study’s purpose was to test the psychometric properties of the new Kidney Transplant Self-Management Scale (KT–SM). The Zimmerman framework adapted for kidney transplant self-management guided the cross-sectional study. A total of 153 kidney recipients recruited from FacebookÂź completed the Self-Efficacy for Managing Chronic Disease (SEMCD), Patient Activation Measure (PAM), Kidney Transplant Questionnaire (KTQ), and KT–SM Scale instruments via a REDCapÂź survey. Most participants were female (65%), White (81.7%), and middle-aged (M = 46.7; SD = 12.4 years) with a history of dialysis (73%) and received a kidney transplant an average of 6.58 years previous (SD = 6.7). Exploratory factor analysis results supported the 16-item KT–SM Scale as a multidimensional scale with five domains with loadings ranging between .39 and .89: medication adherence, protecting kidney, cardiovascular risk reduction, ownership, and skin cancer prevention. Internal consistency reliability for the total scale (Cronbach’s α = .84) and five domains ranged from .71 to .83. The total and domains were positively correlated, ranging from r = .51 to .76, p = .01. Criterion-related validity was evidenced by significant correlations of KT–SM and domains with SEMCD (r =.22 to .53, p = .01), PAM (r = .31 to .52, p = .01), and the overall KTQ (r = .20 to .32, p = .01) except for one KT–SM domain: protecting kidney. Construct validity was evaluated using multivariate regression analysis. The linear combination of age, patient activation, and self-efficacy explained 45% of the variance in KT–SM behaviors; 47% of the variance in KTQ (measuring quality of life) was predicted by age, comorbidity, and self-efficacy. These findings provide beginning evidence of reliability and validity for the newly developed KT–SM scale. Instruments like this may provide a means to capture the self-management behaviors of the kidney transplant population, which is critical for future work on interventions

    Simplifying, reading, and machine translating health content: an empirical investigation of usability

    Get PDF
    Text simplification, through plain language (PL) or controlled language (CL), is adopted to increase readability, comprehension and machine translatability of (health) content. Cochrane is a non-profit organisation where volunteer authors summarise and simplify health-related English texts on the impact of treatments and interventions into plain language summaries (PLS), which are then disseminated online to the lay audience and translated. Cochrane’s simplification approach is non-automated, and involves the manual checking and implementation of different sets of PL guidelines, which can be an unsatisfactory, challenging and time-consuming task. This thesis examined if using the Acrolinx CL checker to automatically and consistently check PLS for readability and translatability issues would increase the usability of Cochrane’s simplification approach and, more precisely: (i) authors’ satisfaction; and (ii) authors’ effectiveness in terms of readability, comprehensibility, and machine translatability into Spanish. Data on satisfaction were collected from twelve Cochrane authors by means of the System Usability Scale and follow-up preference questions. Readability was analysed through the computational tool Coh-Metrix. Evidence on comprehensibility was gathered through ratings and recall protocols produced by lay readers, both native and non-native speakers of English. Machine translatability was assessed in terms of adequacy and fluency with forty-one Cochrane contributors, all native speakers of Spanish. Authors seemed to welcome the introduction of Acrolinx, and the adoption of this CL checker reduced word length, sentence length, and syntactic complexity. No significant impact on comprehensibility and machine translatability was identified. We observed that reading skills and characteristics other than simplified language (e.g. formatting) might influence comprehension. Machine translation quality was relatively high, with mainly style issues. This thesis presented an environment that could boost volunteer authors’ satisfaction and foster their adoption of simple language. We also discussed strategies to increase the accessibility of online health content among lay readers with different skills and language backgrounds

    Enhancing the Communication of Law: a cross-disciplinary investigation applying information technology

    No full text
    Law is pervasive in culture. It is a form of communication between government and citizens. When effective, it is a tool of government policy. If poorly designed,law results in unnecessary costs to society. Impediments to understanding of the law limits and distorts democratic participation. Yet, historically, the law has been inaccessible to most. Thus enhancing the communication of law is an important and standing problem. Much work has been done (for example through the plain language movement) to improve the communication of law. Nonetheless, the law remains largely unreadable to non-legal users. This thesis applies information technology to investigate and enhance the communication of law. To this end, this thesis focusses on four main areas.To improve the readability of law, it must be better described as a form of language. Corpus linguistics is applied for this purpose. A linguistic description of contract language arose from this work, which, along with the corpus itself, has been made available to the research community. The thesis also describes work for the automatic classification of text in legal contracts by legal function.Reliable measures for the readability of law are needed, but they do not exist. To develop such measures, gold standard data is needed to evaluate possible measures.To create this gold standard data, the research engaged citizen scientists, in the form of the online “crowd”. However, methods for creating and using such user assessments for readability are rudimentary. The research therefore investigated,developed and applied a number of methods for collecting user ratings of readability in an online environment. Also, the research applied machine learning to investigate and identify linguistic factors that are specifically associated with language difficulty of legislative sentences. This resulted in recommendations for improving legislative readability. A parallel line of investigation concerned the application of visualization to enhance the communication of law. Visualization engages human visual perception and its parallel processing capacities for the communication of law. The research applied computational tools: natural language processing, graph characteristics and data driven algorithms. It resulted in prototype tools for automatically visualizing definition networks and automating the visualization of selected contract clauses. Also, the work has fostered an investigation of the nature of law itself. A “law as” framework is used to query the nature of law and illuminate law in new ways. The framework is re-assessed as a tool for the experimental investigation of law. This results in an enhanced description of law, applying a number of investigatory frames:law; communication; document; information; computation; design and complex systems theory. It also provides a contrastive study with traditional theories of law - demonstrating how traditional theories can be extended in the light of these multidisciplinary results. In sum, this thesis reports a body of work advancing the existing knowledge base and state of the art in respect of application of computational techniques to enhancing the communication of law
    corecore