13,937 research outputs found
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Detecting Real-World Influence Through Twitter
In this paper, we investigate the issue of detecting the real-life influence
of people based on their Twitter account. We propose an overview of common
Twitter features used to characterize such accounts and their activity, and
show that these are inefficient in this context. In particular, retweets and
followers numbers, and Klout score are not relevant to our analysis. We thus
propose several Machine Learning approaches based on Natural Language
Processing and Social Network Analysis to label Twitter users as Influencers or
not. We also rank them according to a predicted influence level. Our proposals
are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art
ranking methods.Comment: 2nd European Network Intelligence Conference (ENIC), Sep 2015,
Karlskrona, Swede
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed
to competitive performance in language modeling and several NLP tasks. In this
work, we train word embeddings for more than 100 languages using their
corresponding Wikipedias. We quantitatively demonstrate the utility of our word
embeddings by using them as the sole features for training a part of speech
tagger for a subset of these languages. We find their performance to be
competitive with near state-of-art methods in English, Danish and Swedish.
Moreover, we investigate the semantic features captured by these embeddings
through the proximity of word groupings. We will release these embeddings
publicly to help researchers in the development and enhancement of multilingual
applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational
Natural Language Learning CoNLL'201
New Methods, Current Trends and Software Infrastructure for NLP
The increasing use of `new methods' in NLP, which the NeMLaP conference
series exemplifies, occurs in the context of a wider shift in the nature and
concerns of the discipline. This paper begins with a short review of this
context and significant trends in the field. The review motivates and leads to
a set of requirements for support software of general utility for NLP research
and development workers. A freely-available system designed to meet these
requirements is described (called GATE - a General Architecture for Text
Engineering). Information Extraction (IE), in the sense defined by the Message
Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which
many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed.
\cite{Jac92}). An IE system based on GATE is also available for research
purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included
An evaluation of the 'Living with Cancer' project: Using neuro-linguistic programming techniques to maximise the coping strategies of carers and patients living with cancer in Ellesmere Port
The 'Living with cancer' project aimed to improve communication, relationship, and coping skills of health professionals, cancer patients and their carers in Ellesmere Port. The project delivered NLP Diploma training to 55 health professionals between April 2001-March 2004.Commissioned by Chester and Halton NHS Trust and funded through the New Opportunities Fund
- …