Search CORE

85 research outputs found

A multi-terabyte relational database for geo-tagged social network data

Author: Dániel Kondor
Gábor Vattay
István Csabai
János Szüle
József Stéger
László Dobos
Tamás Bodnár
Tamás Hanyecz
Tamás Sebők
Zsófia Kallus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

ELTE Digital Institutional Repository (EDIT)

Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

Author: Bodor András
Budavári Tamás
Csabai István
Dobos László
Kondor Dániel
Szalay Alexander S.
Vattay Gábor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

Video Pandemics: Worldwide Viral Spreading of Psy's Gangnam Style Video

Author: A Vespignani
BE Wiggins
D Balcan
D Brockmann
J Szüle
M Barthelemy
MEJ Newman
V Isham
Z Kallus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/07/2017
Field of study

Viral videos can reach global penetration traveling through international channels of communication similarly to real diseases starting from a well-localized source. In past centuries, disease fronts propagated in a concentric spatial fashion from the the source of the outbreak via the short range human contact network. The emergence of long-distance air-travel changed these ancient patterns. However, recently, Brockmann and Helbing have shown that concentric propagation waves can be reinstated if propagation time and distance is measured in the flight-time and travel volume weighted underlying air-travel network. Here, we adopt this method for the analysis of viral meme propagation in Twitter messages, and define a similar weighted network distance in the communication network connecting countries and states of the World. We recover a wave-like behavior on average and assess the randomizing effect of non-locality of spreading. We show that similar result can be recovered from Google Trends data as well.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Scaling in Words on Twitter

Author: Bokányi Eszter
Kondor Dániel
Vattay Gábor
Publication venue
Publication date: 01/01/2019
Field of study

Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size

arXiv.org e-Print Archive

Repository of the Academy's Library

SQL or NoSQL?:contrasting approaches to the storage, manipulation and analysis of spatio-temporal online social network data

Author: A. Levenshus
B. Warf
B.C. Till
C. Chamley
C. Licoppe
C.-H. Lee
D. Boyd
D. Goldberg
D.C. Mutz
D.F. D’Souza
E. Bahir
E.F. Codd
F.A.Y. Chang
H. Campbell
H. Cunningham
J. Lees-Marshment
J. Lin
J.W. Crampton
K.K.-Y. Lee
L. Humphreys
L. Spinsanti
M. Batty
M.F. Goodchild
M.W. Wilson
N. Andrienko
P. Bernstein
R. Kosala
R. Wilken
R.K. Polat
R.M. Bond
S. Greengard
S. Hong
S. Shekhar
S. Stieglitz
S. Takaragawa
S. Wang
S.W. Campbell
Y. Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Portsmouth University Research Portal (Pure)

Location-aware online learning for top-k recommendation

Author: Abernethy
András A. Benczúr
Bakshy
Bao
Barabasi
Berjani
Chen
Cheng
Cremonesi
Deshpande
Diaz-Aviles
Dobos
Erzsébet Frigó
Gao
Harvey
Hu
Júlia Pap
Kuo
Kurashima
Kwak
Kywe
Lee
Levandoski
Levente Kocsis
Ma
Ma
Mocanu
Pilászy
Pilászy
Preparata
Pálovics
Pálovics
Péter Szalai
Róbert Pálovics
Schein
Shardanand
Symeonidis
Vazquez
Ye
Ye
Zangerle
Zheng
Zheng
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

We address the problem of recommending highly volatile items for users, both with potentially ambiguous location that may change in time. The three main ingredients of our method include (1) using online machine learning for the highly volatile items; (2) learning the personalized importance of hierarchical geolocation (for example, town, region, country, continent); finally (3) modeling temporal relevance by counting recent items with an exponential decay in recency.For (1), we consider a time-aware setting, where evaluation is cumbersome by traditional measures since we have different top recommendations at different times. We describe a time-aware framework based on individual item discounted gain. For (2), we observe that trends and geolocation turns out to be more important than personalized user preferences: user-item and content-item matrix factorization improves in combination with our geo-trend learning methods, but in itself, they are greatly inferior to our location based models. In fact, since our best performing methods are based on spatiotemporal data, they are applicable in the user cold start setting as well and perform even better than content based cold start methods. Finally for (3), we estimate the probability that the item will be viewed by its previous views to obtain a powerful model that combines item popularity and recency.To generate realistic data for measuring our new methods, we rely on Twitter messages with known GPS location and consider hashtags as items that we recommend the users to be included in their next message. © 2016 Elsevier B.V

Crossref

SZTAKI Publication Repository

Spatially embedded real-world networks: Structure and dynamics from global to urban scales

Author: Kallus Zsófia
Publication venue
Publication date: 01/01/2019
Field of study

ELTE Digital Institutional Repository (EDIT)

Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States

Author: Bokányi Eszter
Csabai István
Dobos László
Kondor Dániel
Sebők Tamás
Stéger József
Vattay Gábor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here

arXiv.org e-Print Archive

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)