31 research outputs found

    Real-time classifiers from free-text for continuous surveillance of small animal disease

    Get PDF
    A wealth of information of epidemiological importance is held within unstructured narrative clinical records. Text mining provides computational techniques for extracting usable information from the language used to communicate between humans, including the spoken and written word. The aim of this work was to develop text-mining methodologies capable of rendering the large volume of information within veterinary clinical narratives accessible for research and surveillance purposes. The free-text records collated within the dataset of the Small Animal Veterinary Surveillance Network formed the development material and target of this work. The efficacy of pre-existent clinician-assigned coding applied to the dataset was evaluated and the nature of notation and vocabulary used in documenting consultations was explored and described. Consultation records were pre-processed to improve human and software readability, and software was developed to redact incidental identifiers present within the free-text. An automated system able to classify for the presence of clinical signs, utilising only information present within the free-text record, was developed with the aim that it would facilitate timely detection of spatio-temporal trends in clinical signs. Clinician-assigned main reason for visit coding provided a poor summary of the large quantity of information exchanged during a veterinary consultation and the nature of the coding and questionnaire triggering further obfuscated information. Delineation of the previously undocumented veterinary clinical sublanguage identified common themes and their manner of documentation, this was key to the development of programmatic methods. A rule-based classifier using logically-chosen dictionaries, sequential processing and data-masking redacted identifiers while maintaining research usability of records. Highly sensitive and specific free-text classification was achieved by applying classifiers for individual clinical signs within a context-sensitive scaffold, this permitted or prohibited matching dependent on the clinical context in which a clinical sign was documented. The mean sensitivity achieved within an unseen test dataset was 98.17 (74.47, 99.9)% and mean specificity 99.94 (77.1, 100.0)%. When used in combination to identify animals with any of a combination of gastrointestinal clinical signs, the sensitivity achieved was 99.44% (95% CI: 98.57, 99.78)% and specificity 99.74 (95% CI: 99.62, 99.83). This work illustrates the importance, utility and promise of free-text classification of clinical records and provides a framework within which this is possible whilst respecting the confidentiality of client and clinician

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

    Modelling the cognitive quality of student contributions to online discussion forums

    Get PDF
    Understanding how students can develop their critical thinking skills and engage in social knowledge construction through discussion with their peers is important for both educators and researchers. As asynchronous online discussion forums become increasingly common across educational settings of all kinds, there is a growing need to identify the characteristics of effective discussions that are associated with learning gains. Such findings can inform the way discussion-based assignments are framed and assessed and can provide evidence about the efficacy of instructional interventions. While many messages are purely social in nature, others demonstrate intellectual engagement with the subject matter of the course, to a greater or lesser extent -- the cognitive quality of the message. However, it is not straightforward to measure cognitive quality. Previous research has defined cognitive engagement based on the visible learning behaviours of students; and identified distinct phases of cognitive presence commonly seen in collaborative online discussions among groups of participants. Little prior work has brought together insights from both individual learning behaviours and group discussion dynamics, a gap this thesis aims to fill. This thesis introduces a two-dimensional measure of cognitive quality, making use of constructs from two well-supported educational frameworks: the Interactive-Constructive-Active-Passive framework and the Community of Inquiry framework. Using a pseudonymised set of messages that were labelled using both frameworks, the thesis explores how attributes of the dialogue were correlated with cognitive quality. Message quality was found to depend more on the nested discussion structure than on chronological order. As previously seen with other frameworks, the same messages tended to be identified as high-quality by both frameworks, while there was more variation among mid- and lower-quality messages. The thesis goes on to investigate the potential moderating effects of two instructional interventions: assigning roles to students within the asynchronous online discussions; and an external facilitation intervention, introducing guidelines that aimed to enhance the quality of students' self-regulation. Using a novel network analytic approach, the external facilitation was observed to moderate the associations between the frameworks, while no such change was seen with the role assignment. Finally, the thesis finds that the order in which students took on the assigned roles had minimal impact on the cognitive quality of their contributions to the discussion. This thesis contributes new, actionable findings about the factors that influence the cognitive quality of student contributions to asynchronous online discussions and concludes with a discussion of future research directions

    Natural language processing (NLP) for clinical information extraction and healthcare research

    Get PDF
    Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research

    Bruk av naturlig språkprosessering i psykiatri: En systematisk kartleggingsoversikt

    Get PDF
    Bakgrunn: Bruk av kunstig intelligens (AI) har et stadig økende fokus, også i helsevesenet. En metode som virker lovende, er naturlig språkprosessering (NLP), som kan brukes til analysering av skriftlig tekst, for eksempel tekst i elektroniske pasientjournaler. Denne undersøkelsen har som formål å undersøke forskning som er gjort på bruk av naturlig språkprosessering for analysering av elektroniske journaler fra pasienter med alvorlige psykiske lidelser, som affektive lidelser og psykoselidelser. Den overordnete hensikten med dette, er å få et inntrykk av om noe av forskningen som er gjort har fokus på forbedring av pasientenes helsesituasjon. Materiale og metode: Det ble gjennomført en systematisk kartleggingsoversikt («scoping review»). Litteratursøket ble gjort i én database for medisinsk forskning, PubMed, med søketermene «psychiatry», «electronic medical records» og «natural language processing». Søket var ikke avgrenset i tid. For at en artikkel skulle bli inkludert i undersøkelsen måtte den være empirisk, ha utført analyser på journaldata i fritekst, ha brukt elektroniske journaler fra psykiatriske pasienter med psykoselidelser og/eller affektive lidelser og være skrevet på engelsk språk. Resultater: Litteratursøket resulterte i totalt 211 unike artikler, av disse oppfylte 37 artikler inklusjonskriteriene i kartleggingsoversikten, og ble undersøkt videre. De fleste av studiene var gjennomført i Storbritannia og USA. Størrelsen på studiepopulasjonen varierte mye, fra noen hundre til flere hundre tusen inkluderte pasienter i studiene. Det var lite av forskningen som var gjort på spesifikke dokumenttyper fra pasientjournal, som for eksempel epikriser eller innkomstjournaler. Hensikten for studiene varierte mye, men kunne deles inn i noen felles kategorier: 1) identifisering av informasjon fra journal, 2) kvantitative undersøkelser av populasjonen eller journalene, 3) seleksjon av pasienter til kohorter og 4) vurdering av risiko. Fortolkning: Det trengs mer grunnforskning før teknologi for naturlig språkprosessering til analyse av elektronisk journal vil bidra med forbedring av psykiatriske pasienters helsesituasjon

    Improving Problem-Oriented Policing with Natural Language Processing

    Get PDF
    The policing approach known as Problem oriented policing (POP) was outlined by Herman Goldstein in 1979. Despite POP being shown as an effective method to reduce crime it is difficult to implement because of the high analytical burden that accompanies it. This analytical burden is centred on understanding the mechanism by which a crime took place. One of the factors that contributes to this high burden is that a lot of the required information is stored in free- text data, which has traditionally not been in a format suitable for aggregate analysis. However, advances in machine learning, in particular natural language processing, are lowering the barriers for extracting information from free-text data. This thesis explores the potential for pre-trained language models (PTMs) to efficiently unlock the information in police crime free-text data. PTMs are a new class of machine learning model that are ‘pre-trained’ to recognise the meaning of language. This allows the PTM to interrogate large quantities of free-text data. Thanks to this pre-training, PTMs can be adapted to specific natural language processing tasks with much less effort. Efficiently unlocking the information in the police free-text crime data should reduce the analytical burden for POP. In turn, the lower analytical burden should facilitate the wider adoption of POP. The thesis concludes that the evidence suggests PTMs are potentially an efficient method for extracting useful information from police free text data
    corecore