4,059 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Computational fact checking from knowledge networks
Traditional fact checking by expert journalists cannot keep up with the
enormous volume of information that is now generated online. Computational fact
checking may significantly enhance our ability to evaluate the veracity of
dubious information. Here we show that the complexities of human fact checking
can be approximated quite well by finding the shortest path between concept
nodes under properly defined semantic proximity metrics on knowledge graphs.
Framed as a network problem this approach is feasible with efficient
computational techniques. We evaluate this approach by examining tens of
thousands of claims related to history, entertainment, geography, and
biographical information using a public knowledge graph extracted from
Wikipedia. Statements independently known to be true consistently receive
higher support via our method than do false ones. These findings represent a
significant step toward scalable computational fact-checking methods that may
one day mitigate the spread of harmful misinformation
Weblog For Teachers And Students
Educators are always looking for interesting and innovative ways in which to improve the communications skills of their students and in communicating with their students. This is even more important when the students are studying online. Weblogs, originally a simple web browser based system for updating websites, have developed into powerful web based content management systems, capable of acting as a platform for learning that can accelerate the development of expertise learner behaviour. There is a trend to utilize the benefits of the e-learning as a mechanism to improve learning performance of campus-based students. Whilst traditional methods, such as face-to-face lectures, tutorials, and mentoring, remain dominant in the educational sector, universities are investing heavily in learning technologies, to facilitate improvements with respect to the quality of learning. This study explores into the role of weblogs in supporting teachers during their teaching practice and the key factors determining their engagement with weblogs. Underlying our study is an integrative approach that puts weblogs alongside with other popular media in use. An online community was intentionally built with weblogs to facilitate reflection and social interaction among dispersed teachers. In parallel, multiple channels of communication were employed for peer interaction of students. Weblogs were perceived as valuable in relieving isolation, documenting their experience, and expressing their personal feelings. This study sustains our conviction that the integrative approach is vital to have a comprehensive picture of interaction among a community. Our study deepens the insights into the distinct benefit of weblogs as educational media and informs the future development of an online community with weblogs. Weblogs provide a chance for students to experience writing in a public space where their work can have real value both for their classmates and for a wider community. This prepares students for a networked world where communication is essential and often social and where writing can have consequences
Clustering Memes in Social Media
The increasing pervasiveness of social media creates new opportunities to
study human social behavior, while challenging our capability to analyze their
massive data streams. One of the emerging tasks is to distinguish between
different kinds of activities, for example engineered misinformation campaigns
versus spontaneous communication. Such detection problems require a formal
definition of meme, or unit of information that can spread from person to
person through the social network. Once a meme is identified, supervised
learning methods can be applied to classify different types of communication.
The appropriate granularity of a meme, however, is hardly captured from
existing entities such as tags and keywords. Here we present a framework for
the novel task of detecting memes by clustering messages from large streams of
social data. We evaluate various similarity measures that leverage content,
metadata, network features, and their combinations. We also explore the idea of
pre-clustering on the basis of existing entities. A systematic evaluation is
carried out using a manually curated dataset as ground truth. Our analysis
shows that pre-clustering and a combination of heterogeneous features yield the
best trade-off between number of clusters and their quality, demonstrating that
a simple combination based on pairwise maximization of similarity is as
effective as a non-trivial optimization of parameters. Our approach is fully
automatic, unsupervised, and scalable for real-time detection of memes in
streaming data.Comment: Proceedings of the 2013 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM'13), 201
Language in Our Time: An Empirical Analysis of Hashtags
Hashtags in online social networks have gained tremendous popularity during
the past five years. The resulting large quantity of data has provided a new
lens into modern society. Previously, researchers mainly rely on data collected
from Twitter to study either a certain type of hashtags or a certain property
of hashtags. In this paper, we perform the first large-scale empirical analysis
of hashtags shared on Instagram, the major platform for hashtag-sharing. We
study hashtags from three different dimensions including the temporal-spatial
dimension, the semantic dimension, and the social dimension. Extensive
experiments performed on three large-scale datasets with more than 7 million
hashtags in total provide a series of interesting observations. First, we show
that the temporal patterns of hashtags can be categorized into four different
clusters, and people tend to share fewer hashtags at certain places and more
hashtags at others. Second, we observe that a non-negligible proportion of
hashtags exhibit large semantic displacement. We demonstrate hashtags that are
more uniformly shared among users, as quantified by the proposed hashtag
entropy, are less prone to semantic displacement. In the end, we propose a
bipartite graph embedding model to summarize users' hashtag profiles, and rely
on these profiles to perform friendship prediction. Evaluation results show
that our approach achieves an effective prediction with AUC (area under the ROC
curve) above 0.8 which demonstrates the strong social signals possessed in
hashtags.Comment: WWW 201
- …