Search CORE

241 research outputs found

Dynamics of conflicts in Wikipedia

Author: A Capocci
A Halavais
A Kittur
A Kittur
A Vázquez
AK Laird
AL Barabási
András Kornai
András Rung
Attila Szolnoki
B Adler
B Suh
BQ Vuong
D Laniado
D Laniado
DG Champernowne
DM Wilkinson
DW McDonald
F Ortega
F Tyers
FB Viegas
H Zha
J Giles
J Leskovec
J Ratkiewicz
J Ratkiewicz
J Ratkiewicz
J Schneider
J Voss
János Kertész
K Samson
K Smets
KI Goh
L Buriol
M Hu
M Karsai
M Potthast
M Strube
O Medelyan
P Massa
R Kimmons
R Sumi
R Sumi
RL Rivest
Robert Sumi
S Javanmardi
S Javanmardi
S Vajna
SKS Sharoff
SP Ponzetto
T Gowers
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
U Brandes
U Brandes
V Zlatić
V Zlatić
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.Comment: Supporting information adde

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

Author: Cresci Stefano
Lillo Fabrizio
Regoli Daniele
Tardelli Serena
Tesconi Maurizio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2018
Field of study

Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice - referred to as cashtag piggybacking - perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Among the findings of our study is that as much as 71% of the authors of suspicious financial tweets are classified as bots by a state-of-the-art spambot detection algorithm. Furthermore, 37% of them were suspended by Twitter a few months after our investigation. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Localized Events in Social Media Streams: Detection, Tracking, and Recommendation

Author: Abdelhaq Hamed
Publication venue
Publication date: 01/01/2016
Field of study

From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the wide- spread of mobile devices and the recent advancements of today’s location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event’s location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph- and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. As for recommendation applications, we propose an event recommender system built upon model-based collaborative filtering. Our model is able to suggest events to users, taking into account a number of contextual features including the social links between users, the topical similarities of events, and the spatio-temporal proximity between users and events. To realize this model, we employ and adapt matrix factorization, which allows for uncovering latent user-event patterns. Our proposed features contribute to directing the learning process towards recommendations that better suit the taste of users, in particular when new users have very sparse (or even no) event attendance history. To evaluate the effectiveness and efficiency of our proposed approaches, extensive comparative experiments are conducted using datasets collected from social media channels. Our analysis of the experimental results reveals the superiority and advantages of our frameworks over existing methods in terms of the relevancy and precision of the obtained results

Heidelberger Dokumentenserver

Niche as a determinant of word fate in online groups

Author: A Baronchelli
A Dijksterhuis
A Hotho
Adilson E. Motter
C Cattuto
C Cattuto
C Eble
CD Manning
D Crystal
D Fisher
D Jablonski
D Nettle
D Sornette
D Watts
DJ Hruschka
DM Abrams
DW Nickerson
E Lieberman
Eduardo G. Altmann
EG Altmann
EM Rogers
Enrico Scalas
EV Clark
G Hardin
G Lupyan
G Smitherman
G Szabo
HP Grice
I Trestian
J Kleinberg
J Munat
J-B Michel
J-P Onnela
Janet B. Pierrehumbert
JF Fontanari
K Kuiper
K Lerman
KW Church
L Milroy
L Steels
M Foote
M Pagel
M Seshadri
MA Serrano
MC González
MH Davis
ML Salganik
NL Komarova
P Chesley
P Eckert
P Wexler
Q Lu
R Crane
R Schifanella
R Torres Cacoullos
RA Blythe
RD Malmgren
RK Colwell
RV Solé
S Fortunato
S Kirby
S Wasserman
S Wichmann
W Kruskal
W Labov
Y Neuman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between {their intrinsic properties and the environments in which they function}. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.Comment: Supporting Information is available here: http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0019009.s00

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

On cross-domain social semantic learning

Author: Deb Roy Suman
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Approximately 2.4 billion people are now connected to the Internet, generating massive amounts of data through laptops, mobile phones, sensors and other electronic devices or gadgets. Not surprisingly then, ninety percent of the world's digital data was created in the last two years. This massive explosion of data provides tremendous opportunity to study, model and improve conceptual and physical systems from which the data is produced. It also permits scientists to test pre-existing hypotheses in various fields with large scale experimental evidence. Thus, developing computational algorithms that automatically explores this data is the holy grail of the current generation of computer scientists. Making sense of this data algorithmically can be a complex process, specifically due to two reasons. Firstly, the data is generated by different devices, capturing different aspects of information and resides in different web resources/ platforms on the Internet. Therefore, even if two pieces of data bear singular conceptual similarity, their generation, format and domain of existence on the web can make them seem considerably dissimilar. Secondly, since humans are social creatures, the data often possesses inherent but murky correlations, primarily caused by the causal nature of direct or indirect social interactions. This drastically alters what algorithms must now achieve, necessitating intelligent comprehension of the underlying social nature and semantic contexts within the disparate domain data and a quantifiable way of transferring knowledge gained from one domain to another. Finally, the data is often encountered as a stream and not as static pages on the Internet. Therefore, we must learn, and re-learn as the stream propagates. The main objective of this dissertation is to develop learning algorithms that can identify specific patterns in one domain of data which can consequently augment predictive performance in another domain. The research explores existence of specific data domains which can function in synergy with another and more importantly, proposes models to quantify the synergetic information transfer among such domains. We include large-scale data from various domains in our study: social media data from Twitter, multimedia video data from YouTube, video search query data from Bing Videos, Natural Language search queries from the web, Internet resources in form of web logs (blogs) and spatio-temporal social trends from Twitter. Our work presents a series of solutions to address the key challenges in cross-domain learning, particularly in the field of social and semantic data. We propose the concept of bridging media from disparate sources by building a common latent topic space, which represents one of the first attempts toward answering sociological problems using cross-domain (social) media. This allows information transfer between social and non-social domains, fostering real-time socially relevant applications. We also engineer a concept network from the semantic web, called semNet, that can assist in identifying concept relations and modeling information granularity for robust natural language search. Further, by studying spatio-temporal patterns in this data, we can discover categorical concepts that stimulate collective attention within user groups.Includes bibliographical references (pages 210-214)

University of Missouri: MOspace

Estimating the semantic type of events using location features from Flickr

Author: Dhoedt Bart
Schockaert S
Van Canneyt Steven
Publication venue
Publication date: 01/01/2014
Field of study

Crossref

Ghent University Academic Bibliography

BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

Author
Publication venue: M. Jeusfeld
Publication date: 01/01/2013
Field of study

Portsmouth University Research Portal (Pure)