33 research outputs found

    Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.

    Get PDF
    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/ outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors’ own group) who work in text processing for biomedicine and other areas. GATE is available online ,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis

    Natural language processing and semantic technologies. The application on Brand Rain and Anpro21

    Get PDF
    Este artículo presenta la aplicación y resultados obtenidos de la investigación en técnicas de procesamiento de lenguaje natural y tecnología semántica en Brand Rain y Anpro21. Se exponen todos los proyectos relacionados con las temáticas antes mencionadas y se presenta la aplicación y ventajas de la transferencia de la investigación y nuevas tecnologías desarrolladas a la herramienta de monitorización y cálculo de reputación Brand Rain.This paper presents the application and results on research about natural language processing and semantic technologies in Brand Rain and Anpro21. The related projects are explained and the obtained benefits from the research on this new technologies developed are presented. All this research have been applied on the monitoring and reputation system of Brand Rain

    Knowledge Capture from Multiple Online Sources with the Extensible Web Retrieval Toolkit (eWRT)

    Get PDF
    Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both un-structured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia. eWRT has been released as an open source library under GNU GPLv3. It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization

    Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

    Get PDF
    The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus

    Predictors of severe relapse in pregnant women with psychotic or bipolar disorders

    Get PDF
    Pregnancy in women with severe mental illness is associated with adverse outcomes for mother and infant. There are limited data on prevalence and predictors of relapse in pregnancy. A historical cohort study using anonymised comprehensive electronic health records from secondary mental health care linked with national maternity data was carried out. Women with a history of serious mental illness who were pregnant (2007–2011), and in remission at the start of pregnancy, were studied; severe relapse was defined as admission to acute care or self-harm. Predictors of relapse were analysed using random effects logistic regression to account for repeated measures in women with more than one pregnancy in the study period. In 454 pregnancies (389 women) there were 58 (24%) relapses in women with non-affective psychoses and 25 (12%) in women with affective psychotic or bipolar disorders. Independent predictors of relapse included non-affective psychosis (adjusted OR = 2.03; 95% CI = 1.16–3.54), number of recent admissions (1.37; 1.03–1.84), recent self-harm (2.24; 1.15–4.34), substance use (2.15; 1.13–4.08), smoking (2.52; 1.26–5.02) and non-white ethnicity (black ethnicity: 2.37; 1.23,4.57, mixed/other ethnicity: 2.94; 1.32,6.56). Women on no regular medication throughout first trimester were also at greater risk of relapse in pregnancy (1.99; 1.05–3.75). There was no interaction between severity of illness and medication status as relapse predictors. Therefore, women with non-affective psychosis and higher number of recent acute admissions are at significant risk of severe relapse in pregnancy. Continuation of medication in women with severe mental illness who become pregnant may be protective