194 research outputs found
Exploring colorectal cancer patients' perceptions of the quality of their care
This report discusses the local situation in Halton with regard to colorectal cancer care by exploring how patients perceived the quality of their care.Widnes Primary Care Grou
Recommended from our members
Social Measurement and Causal Inference with Text
The digital age has dramatically increased access to large-scale collections of digitized text documents. These corpora include, for example, digital traces from social media, decades of archived news reports, and transcripts of spoken interactions in political, legal, and economic spheres. For social scientists, this new widespread data availability has potential for improved quantitative analysis of relationships between language use and human thought, actions, and societal structure. However, the large-scale nature of these collections means that traditional manual approaches to analyzing content are extremely costly and do not scale. Furthermore, incorporating unstructured text data into quantitative analysis is difficult due to texts’ high-dimensional nature and linguistic complexity.
This thesis blends (a) the computational strengths of natural language processing (NLP) and machine learning to automate and scale-up quantitative text analysis with (b) two themes central to social scientific studies but often under-addressed in NLP: measurement—creating quantifiable summaries of empirical phenomena—and causal inference—estimating the effects of interventions. First, we address measuring class prevalence in document collections; we contribute a generative probabilistic modeling approach to prevalence estimation and show empirically that our model is more robust to shifts in class priors between training and inference. Second, we examine cross- document entity-event measurement; we contribute an empirical pipeline and a novel latent disjunction model to identify the names of civilians killed by police from our corpus of web-scraped news reports. Third, we gather and categorize applications that use text to reduce confounding from causal estimates and contribute a list of open problems as well as guidance about data processing and evaluation decisions in this area. Finally, we contribute a new causal research design to estimate the natural indirect and direct effects of social group signals (e.g. race or gender) on conversational outcomes with separate aspects of language as causal mediators; this chapter is motivated by a theoretical case study of U.S. Supreme Court oral arguments and the effect of an advocate’s gender on interruptions from justices. We conclude by discussing the relationship between measurement and causal inference with text and future work at this intersection
Nationality Classification Using Name Embeddings
Nationality identification unlocks important demographic information, with
many applications in biomedical and sociological research. Existing name-based
nationality classifiers use name substrings as features and are trained on
small, unrepresentative sets of labeled names, typically extracted from
Wikipedia. As a result, these methods achieve limited performance and cannot
support fine-grained classification.
We exploit the phenomena of homophily in communication patterns to learn name
embeddings, a new representation that encodes gender, ethnicity, and
nationality which is readily applicable to building classifiers and other
systems. Through our analysis of 57M contact lists from a major Internet
company, we are able to design a fine-grained nationality classifier covering
39 groups representing over 90% of the world population. In an evaluation
against other published systems over 13 common classes, our F1 score (0.795) is
substantial better than our closest competitor Ethnea (0.580). To the best of
our knowledge, this is the most accurate, fine-grained nationality classifier
available.
As a social media application, we apply our classifiers to the followers of
major Twitter celebrities over six different domains. We demonstrate stark
differences in the ethnicities of the followers of Trump and Obama, and in the
sports and entertainments favored by different groups. Finally, we identify an
anomalous political figure whose presumably inflated following appears largely
incapable of reading the language he posts in.Comment: 10 pages, 9 figures, 4 table, accepted by CIKM 2017, Demo and free
API: www.name-prism.co
REVISITING RECOGNIZING TEXTUAL ENTAILMENT FOR EVALUATING NATURAL LANGUAGE PROCESSING SYSTEMS
Recognizing Textual Entailment (RTE) began as a unified framework to evaluate the reasoning capabilities of Natural Language Processing (NLP) models. In recent years, RTE has evolved in the NLP community into a task that researchers focus on developing models for. This thesis revisits the tradition of RTE as an evaluation framework for NLP models, especially in the era of deep learning.
Chapter 2 provides an overview of different approaches to evaluating NLP sys- tems, discusses prior RTE datasets, and argues why many of them do not serve as satisfactory tests to evaluate the reasoning capabilities of NLP systems. Chapter 3 presents a new large-scale diverse collection of RTE datasets (DNC) that tests how well NLP systems capture a range of semantic phenomena that are integral to un- derstanding human language. Chapter 4 demonstrates how the DNC can be used to evaluate reasoning capabilities of NLP models. Chapter 5 discusses the limits of RTE as an evaluation framework by illuminating how existing datasets contain biases that may enable crude modeling approaches to perform surprisingly well.
The remaining aspects of the thesis focus on issues raised in Chapter 5. Chapter 6 addresses issues in prior RTE datasets focused on paraphrasing and presents a high-quality test set that can be used to analyze how robust RTE systems are to paraphrases. Chapter 7 demonstrates how modeling approaches on biases, e.g. adversarial learning, can enable RTE models overcome biases discussed in Chapter 5. Chapter 8 applies these methods to the task of discovering emergency needs during disaster events
Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective
Machine Learning models are being deployed as parts of real-world systems
with the upsurge of interest in artificial intelligence. The design,
implementation, and maintenance of such systems are challenged by real-world
environments that produce larger amounts of heterogeneous data and users
requiring increasingly faster responses with efficient resource consumption.
These requirements push prevalent software architectures to the limit when
deploying ML-based systems. Data-oriented Architecture (DOA) is an emerging
concept that equips systems better for integrating ML models. DOA extends
current architectures to create data-driven, loosely coupled, decentralised,
open systems. Even though papers on deployed ML-based systems do not mention
DOA, their authors made design decisions that implicitly follow DOA. The
reasons why, how, and the extent to which DOA is adopted in these systems are
unclear. Implicit design decisions limit the practitioners' knowledge of DOA to
design ML-based systems in the real world. This paper answers these questions
by surveying real-world deployments of ML-based systems. The survey shows the
design decisions of the systems and the requirements these satisfy. Based on
the survey findings, we also formulate practical advice to facilitate the
deployment of ML-based systems. Finally, we outline open challenges to
deploying DOA-based systems that integrate ML models.Comment: Under revie
Controlled trial of hypnotherapy as a treatment for irritable bowel syndrome
Nineteenth century philosophy and anatomy regarded the nervous system as the only pathway of communication between the brain and body but now, research in the field of psychoneuroimmunology (PNI) has provided evidence to prove the age-old belief that there is a connection between the mind (or mental/emotional states) and the body. Researchers in PNI have now shown that the communication between the nervous and immune systems is bi-directional – i.e. there is a psychological reaction to physical disease and a somatic presentation of psychological disorders - and that the immune system, the autonomic nervous system, the endocrine system and the neuropeptide systems all communicate with each other by means of chemicals called messenger molecules or ligands. This paper outlines research into the treatment of Irritable Bowel Syndrome (IBS) with hypnotherapy, taking into account the mind-body connection and treating both the patient’s physiological and emotional/psychological symptoms rather than treating the physiological symptoms only. In other words, using a more holistic approach to the treatment of IBS. IBS is probably the most common functional gastrointestinal disorder encountered by both gastroenterologists and physicians in primary care. It is estimated that from 10% to 25% of the general population suffer from this condition and that it comprises about 30-50% of the gastroenterologists’ workload, yet the aetiology of IBS is unknown and, so far, there is no cure. Researchers are beginning to view IBS as a multi-faceted disorder in which there appears to be a disturbance in the interaction between the intestines, brain, and autonomic nervous system, resulting in an alteration in the regulation of bowel motility and/or sensory function. Most researchers agree that a subset of IBS sufferers have a visceral hypersensitivity of the gut or, more specifically, an increased perception of sensations in the gut. To date, studies of IBS have proposed previous gastroenteritis, small intestine bacterial overgrowth, psychosocial factors, a genetic contribution, and an imbalance of neurotransmitters as either possible causes or playing a part in the development of IBS. It is generally agreed that a patient’s emotional response to stress can exacerbate the condition. In section 1 of the thesis, the introduction, a detailed description and background appropriate to the study undertaken are provided, including aspects of epidemiology, diagnostic symptom criteria and clinical relevance of the Irritable Bowel Syndrome. Previous studies of various forms of treatment for IBS are discussed with the main emphasis being on treatment with hypnotherapy. All these therapies have concentrated on either mind or body treatments whereas this study demonstrates how hypnotherapy, and the use of imagery, addresses both mind and body. Finally, the rationale for the current study and the specific aims of the thesis are outlined. In section 2, the methodology and assessment instruments used in the clinical trial are discussed, as well as recruitment processes, research plan and timetable, and treatment schedule. Statistical analyses are provided and the main outcomes measures of the clinical trial, its limitations and scientific implications are addressed
Quantitative Assessment of Factors in Sentiment Analysis
Sentiment can be defined as a tendency to experience certain emotions in relation to a particular object or person. Sentiment may be expressed in writing, in which case determining that sentiment algorithmically is known as sentiment analysis. Sentiment analysis is often applied to Internet texts such as product reviews, websites, blogs, or tweets, where automatically determining published feeling towards a product, or service is very useful to marketers or opinion analysts. The main goal of sentiment analysis is to identify the polarity of natural language text.
This thesis sets out to examine quantitatively the factors that have an effect on sentiment analysis. The factors that are commonly used in sentiment analysis are text features, sentiment lexica or resources, and the machine learning algorithms employed. The main aim of this thesis is to investigate systematically the interaction between sentiment analysis factors and machine learning algorithms in order to improve sentiment analysis performance as compared to the opinions of human assessors. A software system known as TJP was designed and developed to support this investigation.
The research reported here has three main parts. Firstly, the role of data pre-processing was investigated with TJP using a combination of features together with publically available datasets. This considers the relationship and relative importance of superficial text features such as emoticons, n-grams, negations, hashtags, repeated letters, special characters, slang, and stopwords. The resulting statistical analysis suggests that a combination of all of these features achieves better accuracy with the dataset, and had a considerable effect on system performance.
Secondly, the effect of human marked up training data was considered, since this is required by supervised machine learning algorithms. The results gained from TJP suggest that training data greatly augments sentiment analysis performance. However, the combination of training data and sentiment lexica seems to provide optimal performance. Nevertheless, one particular sentiment lexicon, AFINN, contributed better than others in the absence of training data, and therefore would be appropriate for unsupervised approaches to sentiment analysis.
Finally, the performance of two sophisticated ensemble machine learning algorithms was investigated. Both the Arbiter Tree and Combiner Tree were chosen since neither of them has previously been used with sentiment analysis. The objective here was to demonstrate their applicability and effectiveness compared to that of the leading single machine learning algorithms, Naïve Bayes, and Support Vector Machines. The results showed that whilst either can be applied to sentiment analysis, the Arbiter Tree ensemble algorithm achieved better accuracy performance than either the Combiner Tree or any single machine learning algorithm
- …