6,129 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Recommended from our members
Verifying baselines for crisis event information classification on Twitter
Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and complimentary type of crisis-relevant information to demonstrate it’s generalisability
Using Twitter to learn about the autism community
Considering the raising socio-economic burden of autism spectrum disorder
(ASD), timely and evidence-driven public policy decision making and
communication of the latest guidelines pertaining to the treatment and
management of the disorder is crucial. Yet evidence suggests that policy makers
and medical practitioners do not always have a good understanding of the
practices and relevant beliefs of ASD-afflicted individuals' carers who often
follow questionable recommendations and adopt advice poorly supported by
scientific data. The key goal of the present work is to explore the idea that
Twitter, as a highly popular platform for information exchange, could be used
as a data-mining source to learn about the population affected by ASD -- their
behaviour, concerns, needs etc. To this end, using a large data set of over 11
million harvested tweets as the basis for our investigation, we describe a
series of experiments which examine a range of linguistic and semantic aspects
of messages posted by individuals interested in ASD. Our findings, the first of
their nature in the published scientific literature, strongly motivate
additional research on this topic and present a methodological basis for
further work.Comment: Social Network Analysis and Mining, 201
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
- …