7,603 research outputs found

    Citizens and Institutions as Information Prosumers. The Case Study of Italian Municipalities on Twitter

    Get PDF
    The aim of this paper is to address changes in public communication following the advent of Internet social networking tools and the emerging web 2.0 technologies which are providing new ways of sharing information and knowledge. In particular public administrations are called upon to reinvent the governance of public affairs and to update the means for interacting with their communities. The paper develops an analysis of the distribution, diffusion and performance of the official profiles on Twitter adopted by the Italian municipalities (comuni) up to November 2013. It aims to identify the patterns of spatial distribution and the drivers of the diffusion of Twitter profiles; the performance of the profiles through an aggregated index, called the Twitter performance index (Twiperindex), which evaluates the profiles' activity with reference to the gravitational areas of the municipalities in order to enable comparisons of the activity of municipalities with different demographic sizes and functional roles. The results show that only a small portion of innovative municipalities have adopted Twitter to enhance e-participation and e-governance and that the drivers of the diffusion seem to be related either to past experiences and existing conditions (i.e. civic networks, digital infrastructures) developed over time or to strong local community awareness. The better performances are achieved mainly by small and medium-sized municipalities. Of course, the phenomenon is very new and fluid, therefore this analysis should be considered as a first step in ongoing research which aims to grasp the dynamics of these new means of public communication

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    A Continuously Growing Dataset of Sentential Paraphrases

    Full text link
    A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs. The main advantage of our method is its simplicity, as it gets rid of the classifier or human in the loop needed to select data before annotation and subsequent application of paraphrase identification algorithms in the previous work. We present the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. In addition, we show that more than 30,000 new sentential paraphrases can be easily and continuously captured every month at ~70% precision, and demonstrate their utility for downstream NLP tasks through phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

    Get PDF

    Can Real Social Epistemic Networks Deliver the Wisdom of Crowds?

    Get PDF
    In this paper, we explain and showcase the promising methodology of testimonial network analysis and visualization for experimental epistemology, arguing that it can be used to gain insights and answer philosophical questions in social epistemology. Our use case is the epistemic community that discusses vaccine safety primarily in English on Twitter. In two studies, we show, using both statistical analysis and exploratory data visualization, that there is almost no neutral or ambivalent discussion of vaccine safety on Twitter. Roughly half the accounts engaging with this topic are pro-vaccine, while the other half are con-vaccine. We also show that these two camps rarely engage with one another, and that the con-vaccine camp has greater epistemic reach and receptivity than the pro-vaccine camp. In light of these findings, we question whether testimonial networks as they are currently constituted on popular fora such as Twitter are living up to their promise of delivering the wisdom of crowds. We conclude by pointing to directions for further research in digital social epistemology

    Infectivity Enhances Prediction of Viral Cascades in Twitter

    Get PDF
    Models of contagion dynamics, originally developed for infectious diseases, have proven relevant to the study of information, news, and political opinions in online social systems. Modelling diffusion processes and predicting viral information cascades are important problems in network science. Yet, many studies of information cascades neglect the variation in infectivity across different pieces of information. Here, we employ early-time observations of online cascades to estimate the infectivity of distinct pieces of information. Using simulations and data from real-world Twitter retweets, we demonstrate that these estimated infectivities can be used to improve predictions about the virality of an information cascade. Developing our simulations to mimic the real-world data, we consider the effect of the limited effective time for transmission of a cascade and demonstrate that a simple model for slow but non-negligible decay of the infectivity captures the essential properties of retweet distributions. These results demonstrate the interplay between the intrinsic infectivity of a tweet and the complex network environment within which it diffuses, strongly influencing the likelihood of becoming a viral cascade.Comment: 16 pages, 10 figure
    • …
    corecore