7,603 research outputs found
Citizens and Institutions as Information Prosumers. The Case Study of Italian Municipalities on Twitter
The aim of this paper is to address changes in public communication following the advent of Internet social networking tools and the emerging web 2.0 technologies which are providing new ways of sharing information and knowledge. In particular public administrations are called upon to reinvent the governance of public affairs and to update the means for interacting with their communities. The paper develops an analysis of the distribution, diffusion and performance of the official profiles on Twitter adopted by the Italian municipalities (comuni) up to November 2013. It aims to identify the patterns of spatial distribution and the drivers of the diffusion of Twitter profiles; the performance of the profiles through an aggregated index, called the Twitter performance index (Twiperindex), which evaluates the profiles' activity with reference to the gravitational areas of the municipalities in order to enable comparisons of the activity of municipalities with different demographic sizes and functional roles. The results show that only a small portion of innovative municipalities have adopted Twitter to enhance e-participation and e-governance and that the drivers of the diffusion seem to be related either to past experiences and existing conditions (i.e. civic networks, digital infrastructures) developed over time or to strong local community awareness. The better performances are achieved mainly by small and medium-sized municipalities. Of course, the phenomenon is very new and fluid, therefore this analysis should be considered as a first step in ongoing research which aims to grasp the dynamics of these new means of public communication
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Can Real Social Epistemic Networks Deliver the Wisdom of Crowds?
In this paper, we explain and showcase the promising methodology of testimonial network analysis and visualization for experimental epistemology, arguing that it can be used to gain insights and answer philosophical questions in social epistemology. Our use case is the epistemic community that discusses vaccine safety primarily in English on Twitter. In two studies, we show, using both statistical analysis and exploratory data visualization, that there is almost no neutral or ambivalent discussion of vaccine safety on Twitter. Roughly half the accounts engaging with this topic are pro-vaccine, while the other half are con-vaccine. We also show that these two camps rarely engage with one another, and that the con-vaccine camp has greater epistemic reach and receptivity than the pro-vaccine camp. In light of these findings, we question whether testimonial networks as they are currently constituted on popular fora such as Twitter are living up to their promise of delivering the wisdom of crowds. We conclude by pointing to directions for further research in digital social epistemology
Infectivity Enhances Prediction of Viral Cascades in Twitter
Models of contagion dynamics, originally developed for infectious diseases,
have proven relevant to the study of information, news, and political opinions
in online social systems. Modelling diffusion processes and predicting viral
information cascades are important problems in network science. Yet, many
studies of information cascades neglect the variation in infectivity across
different pieces of information. Here, we employ early-time observations of
online cascades to estimate the infectivity of distinct pieces of information.
Using simulations and data from real-world Twitter retweets, we demonstrate
that these estimated infectivities can be used to improve predictions about the
virality of an information cascade. Developing our simulations to mimic the
real-world data, we consider the effect of the limited effective time for
transmission of a cascade and demonstrate that a simple model for slow but
non-negligible decay of the infectivity captures the essential properties of
retweet distributions. These results demonstrate the interplay between the
intrinsic infectivity of a tweet and the complex network environment within
which it diffuses, strongly influencing the likelihood of becoming a viral
cascade.Comment: 16 pages, 10 figure
- …