11 research outputs found

    Cross-sectional analysis of UK research studies in 2015: results from a scoping project with the UK Health Research Authority

    Get PDF
    Objectives To determine whether data on research studies held by the UK Health Research Authority (HRA) could be summarised automatically with minimal manual intervention. There are numerous initiatives to reduce research waste by improving the design, conduct, analysis and reporting of clinical studies. However, quantitative data on the characteristics of clinical studies and the impact of the various initiatives are limited. Design Feasibility study, using 1 year of data. Setting We worked with the HRA on a pilot study using research applications submitted for UK-wide ethical review. We extracted into a single dataset, information held in anonymised XML files by the Integrated Research Application System (IRAS) and the HRA Assessment Review Portal (HARP). Research applications from 2014 to 2016 were provided. We used standard text extraction methods to assess information held in free-text fields. We use simple, descriptive methods to summarise the research activities that we extracted. Participants Not applicable-records-based study Interventions Not applicable. Primary and secondary outcome measures Feasibility of extraction and processing. Results We successfully imported 1775 non-duplicate research applications from the XML files into a single database. Of these, 963 were randomised controlled trials and 812 were other studies. Most studies received a favourable opinion. There was limited patient and public involvement in the studies. Most, but not all, studies were planned for publication of results. Novel study designs (eg, adaptive and Bayesian designs) were infrequently reported. Conclusions We have demonstrated that the data submitted from IRAS to the HRA and its HARP system are accessible and can be queried for information. We strongly encourage the development of fully resourced collaborative projects to further this work. This would aid understanding of how study characteristics change over time and across therapeutic areas, as well as the progress of initiatives to improve the quality and relevance of research studies

    Identifying Smoking Status From Implicit Information In Medical Discharge Summaries

    No full text
    Human annotators and natural language applications are able to identify smoking status from discharge summaries with high accuracy when explicit evidence regarding their smoking status is present in the summary. We explore the possibility of identifying the smoking status from discharge summaries when these smoking terms have been removed. We present results using a Naïve Bayes classifier on a smoke-blind set of discharge summaries and compare this to the performance of human annotators on the same dataset

    Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries

    Get PDF
    As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the smoke-blind dataset). The performance of the Naïve Bayes classifier was compared with the performance of three human annotators on a subset of the same training dataset (n = 54) and against the evaluation dataset (n = 104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators

    SWATAC: A Sentiment Analyzer Using One-Vs-Rest Logistic Regression

    No full text
    This paper describes SWATAC, a system built for SemEval-2015’s Task 10 Subtask B, namely the Message Polarity Classification Task. Given a tweet, the system classifies the sentiment as either positive, negative, or neutral. Several preprocessing tasks such as negation detection, spell checking, and tokenization are performed to enhance lexical information. The features are then augmented with external sentiment lexicons. Classification is done with Logistic Regression using a one-vs-rest configuration. For the test runs, the system was trained using only the provided training tweets. The classifier was successful, with an F1 score of 58.43 on the official 2015 test data, and an F1 score of 66.64 on the Twitter 2014 progress data

    UCD-PN: classification of semantic relations between nominals using wordnet and web counts

    No full text
    For our system we use the SMO implementation of a support vector machine provided with the WEKA machine learning toolkit. As with all machine learning approaches, the most important step is to choose a set of features which reliably help to predict the label of the example. We used 76 features drawn from two very different knowledge sources. The first 48 features are boolean values indicating whether or not each of the nominals in the sentence are linked to certain other words in the WordNet hypernym and meronym networks. The remaining 28 features are web frequency counts for the two nominals joined by certain common prepositions and verbs. Our system performed well on all but two of the relations; theme-tool and origin entity
    corecore