3,705 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Echoes of power: Language effects and power differences in social interaction
Understanding social interaction within groups is key to analyzing online
communities. Most current work focuses on structural properties: who talks to
whom, and how such interactions form larger network structures. The
interactions themselves, however, generally take place in the form of natural
language --- either spoken or written --- and one could reasonably suppose that
signals manifested in language might also provide information about roles,
status, and other aspects of the group's dynamics. To date, however, finding
such domain-independent language-based signals has been a challenge.
Here, we show that in group discussions power differentials between
participants are subtly revealed by how much one individual immediately echoes
the linguistic style of the person they are responding to. Starting from this
observation, we propose an analysis framework based on linguistic coordination
that can be used to shed light on power relationships and that works
consistently across multiple types of power --- including a more "static" form
of power based on status differences, and a more "situational" form of power in
which one individual experiences a type of dependence on another. Using this
framework, we study how conversational behavior can reveal power relationships
in two very different settings: discussions among Wikipedians and arguments
before the U.S. Supreme Court.Comment: v3 is the camera-ready for the Proceedings of WWW 2012. Changes from
v2 include additional technical analysis. See
http://www.cs.cornell.edu/~cristian/www2012 for data and more inf
Recruiting from the network: discovering Twitter users who can help combat Zika epidemics
Tropical diseases like \textit{Chikungunya} and \textit{Zika} have come to
prominence in recent years as the cause of serious, long-lasting,
population-wide health problems. In large countries like Brasil, traditional
disease prevention programs led by health authorities have not been
particularly effective. We explore the hypothesis that monitoring and analysis
of social media content streams may effectively complement such efforts.
Specifically, we aim to identify selected members of the public who are likely
to be sensitive to virus combat initiatives that are organised in local
communities. Focusing on Twitter and on the topic of Zika, our approach
involves (i) training a classifier to select topic-relevant tweets from the
Twitter feed, and (ii) discovering the top users who are actively posting
relevant content about the topic. We may then recommend these users as the
prime candidates for direct engagement within their community. In this short
paper we describe our analytical approach and prototype architecture, discuss
the challenges of dealing with noisy and sparse signal, and present encouraging
preliminary results
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
- …