13 research outputs found

    Cross-sectional analysis of UK research studies in 2015: results from a scoping project with the UK Health Research Authority

    Get PDF
    Objectives To determine whether data on research studies held by the UK Health Research Authority (HRA) could be summarised automatically with minimal manual intervention. There are numerous initiatives to reduce research waste by improving the design, conduct, analysis and reporting of clinical studies. However, quantitative data on the characteristics of clinical studies and the impact of the various initiatives are limited. Design Feasibility study, using 1 year of data. Setting We worked with the HRA on a pilot study using research applications submitted for UK-wide ethical review. We extracted into a single dataset, information held in anonymised XML files by the Integrated Research Application System (IRAS) and the HRA Assessment Review Portal (HARP). Research applications from 2014 to 2016 were provided. We used standard text extraction methods to assess information held in free-text fields. We use simple, descriptive methods to summarise the research activities that we extracted. Participants Not applicable-records-based study Interventions Not applicable. Primary and secondary outcome measures Feasibility of extraction and processing. Results We successfully imported 1775 non-duplicate research applications from the XML files into a single database. Of these, 963 were randomised controlled trials and 812 were other studies. Most studies received a favourable opinion. There was limited patient and public involvement in the studies. Most, but not all, studies were planned for publication of results. Novel study designs (eg, adaptive and Bayesian designs) were infrequently reported. Conclusions We have demonstrated that the data submitted from IRAS to the HRA and its HARP system are accessible and can be queried for information. We strongly encourage the development of fully resourced collaborative projects to further this work. This would aid understanding of how study characteristics change over time and across therapeutic areas, as well as the progress of initiatives to improve the quality and relevance of research studies

    SWATCS65: Sentiment Classification Using An Ensemble Of Class Projects

    No full text
    This paper presents the SWATCS65 ensemble classifier used to identify the sentiment of tweets. The classifier was trained and tested using data provided by Semeval-2015, Task 10, subtask B with the goal to label the sentiment of an entire tweet. The ensemble was constructed from 26 classifiers, each written by a group of one to three undergraduate students in the Fall 2014 offering of a natural language processing course at Swarthmore College. Each of the classifiers was designed independently, though much of the early structure was provided by in-class lab assignments. There was high variability in the final performance of each of these classifiers, which were combined using a weighted voting scheme with weights correlated with performance using 5-fold cross-validation on the provided training data. The system performed very well, achieving an F1 score of 61.89

    Identifying Smoking Status From Implicit Information In Medical Discharge Summaries

    No full text
    Human annotators and natural language applications are able to identify smoking status from discharge summaries with high accuracy when explicit evidence regarding their smoking status is present in the summary. We explore the possibility of identifying the smoking status from discharge summaries when these smoking terms have been removed. We present results using a Naïve Bayes classifier on a smoke-blind set of discharge summaries and compare this to the performance of human annotators on the same dataset

    Pinch Hitting: A Graphical History Of Yankees WAR

    No full text

    SwatCS: Combining Simple Classifiers With Estimated Accuracy

    No full text
    This paper is an overview of the SwatCS system submitted to SemEval-2013 Task 2A: Contextual Polarity Disambiguation. The sentiment of individual phrases within a tweet are labeled using a combination of classifiers trained on a range of lexical features. The classifiers are combined by estimating the accuracy of the classifiers on each tweet. Performance is measured when using only the provided trainng data, and separtaely when including external data

    SWATAC: A Sentiment Analyzer Using One-Vs-Rest Logistic Regression

    No full text
    This paper describes SWATAC, a system built for SemEval-2015’s Task 10 Subtask B, namely the Message Polarity Classification Task. Given a tweet, the system classifies the sentiment as either positive, negative, or neutral. Several preprocessing tasks such as negation detection, spell checking, and tokenization are performed to enhance lexical information. The features are then augmented with external sentiment lexicons. Classification is done with Logistic Regression using a one-vs-rest configuration. For the test runs, the system was trained using only the provided training tweets. The classifier was successful, with an F1 score of 58.43 on the official 2015 test data, and an F1 score of 66.64 on the Twitter 2014 progress data

    Computational Semantic Analysis Of Language: SemEval-2007 And Beyond

    No full text

    SW-AG: Local Context Matching For English Lexical Substitution

    No full text
    We present two systems that pick the ten most appropriate substitutes for a marked word in a test sentence. The first system scores candidates based on how frequently their local contexts match that of the marked word. The second system, an enhancement to the first, incorporates cosine similarity using unigram features. The core of both systems bypasses intermediate sense selection. Our results show that a knowledge-light, direct method for scoring potential replacements is viable

    SWASH: A Naive Bayes Classifier For Tweet Sentiment Identification

    No full text
    This paper describes a sentiment classification system designed for SemEval-2015, Task 10, Subtask B. The system employs a constrained, supervised text categorization approach. Firstly, since thorough preprocessing of tweet data was shown to be effective in previous SemEval sentiment classification tasks, various preprocessessing steps were introduced to enhance the quality of lexical information. Secondly, a Naive Bayes classifier is used to detect tweet sentiment. The classifier is trained only on the training data provided by the task organizers. The system makes use of external human-generated lists of positive and negative words at several steps throughout classification. The system produced an overall F-score of 59.26 on the official test set

    SWAT-MP: The SemEval-2007 Systems For Task 5 And Task 14

    No full text
    In this paper, we describe our two SemEval-2007 entries. Our first entry, for Task 5: Multilingual Chinese-English Lexical Sample Task, is a supervised system that decides the most appropriate English translation of a Chinese target word. This system uses a combination of Naïve Bayes, nearest neighbor cosine, decision lists, and latent semantic analysis. Our second entry, for Task 14: Affective Text, is a supervised system that annotates headlines using a predefined list of emotions. This system uses synonym expansion and matches lemmatized unigrams in the test headlines against a corpus of hand-annotated headlines
    corecore