Search CORE

137,889 research outputs found

Do we really need to catch them all? A new User-guided Social Media Crawling method

Author: Boldt Martin
Bródka Piotr
Erlandsson Fredrik
Johnson Henric
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

With the growing use of popular social media services like Facebook and Twitter it is challenging to collect all content from the networks without access to the core infrastructure or paying for it. Thus, if all content cannot be collected one must consider which data are of most importance. In this work we present a novel User-guided Social Media Crawling method (USMC) that is able to collect data from social media, utilizing the wisdom of the crowd to decide the order in which user generated content should be collected to cover as many user interactions as possible. USMC is validated by crawling 160 public Facebook pages, containing content from 368 million users including 1.3 billion interactions, and it is compared with two other crawling methods. The results show that it is possible to cover approximately 75% of the interactions on a Facebook page by sampling just 20% of its posts, and at the same time reduce the crawling time by 53%. In addition, the social network constructed from the 20% sample contains more than 75% of the users and edges compared to the social network created from all posts, and it has similar degree distribution

arXiv.org e-Print Archive

Blekinge Institute of Technology

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Network Sampling: From Static to Streaming Graphs

Author: Ahmed Nesreen K.
Kompella Ramana
Neville Jennifer
Publication venue
Publication date: 13/11/2012
Field of study

Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Our experimental results indicate that our proposed family of sampling methods more accurately preserves the underlying properties of the graph for both static and streaming graphs. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms

arXiv.org e-Print Archive

CiteSeerX

Crawling Facebook for Social Network Analysis Purposes

Author: Catanese Salvatore
De Meo Pasquale
Ferrara Emilio
Fiumara Giacomo
Provetti Alessandro
Publication venue: ACM
Publication date: 01/01/2011
Field of study

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.\u

arXiv.org e-Print Archive

CiteSeerX

Crossref

CogPrints Cognitive Sciences Eprint Archive

Implementation of Web-Based Respondent-Driven Sampling among Men who Have Sex with Men in Vietnam

Author: AD Smith
Anna Thorson
C Wejnert
C Wejnert
D Colby
D Heckathorn
D Zhang
DA Ngo
DD Heckathorn
DD Heckathorn
DJ Watts
E Volz
Fredrik Liljeros
G Mumtaz
J Bauermeister
JA Bauermeister
L Bengtsson
LA Goodman
LG Johnston
LG Johnston
Linus Bengtsson
Martin Camitz
MJ Salganik
MJ Salganik
MY Iguchi
Nguyen Le Hoang
Patricia Kissinger
Quoc Cuong Nguyen
R Magnani
S Baral
S Goel
S Goel
SD Frost
Tuan Anh Nguyen
X Lu
Xin Lu
XY Ma
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Objective: Lack of representative data about hidden groups, like men who have sex with men (MSM), hinders an evidence-based response to the HIV epidemics. Respondent-driven sampling (RDS) was developed to overcome sampling challenges in studies of populations like MSM for which sampling frames are absent. Internet-based RDS (webRDS) can potentially circumvent limitations of the original RDS method. We aimed to implement and evaluate webRDS among a hidden population. Methods and Design: This cross-sectional study took place 18 February to 12 April, 2011 among MSM in Vietnam. Inclusion criteria were men, aged 18 and above, who had ever had sex with another man and were living in Vietnam. Participants were invited by an MSM friend, logged in, and answered a survey. Participants could recruit up to four MSM friends. We evaluated the system by its success in generating sustained recruitment and the degree to which the sample compositions stabilized with increasing sample size. Results: Twenty starting participants generated 676 participants over 24 recruitment waves. Analyses did not show evidence of bias due to ineligible participation. Estimated mean age was 22 year and 82% came from the two large metropolitan areas. 32 out of 63 provinces were represented. The median number of sexual partners during the last six months was two. The sample composition stabilized well for 16 out of 17 variables. Conclusion: Results indicate that webRDS could be implemented at a low cost among Internet-using MSM in Vietnam. WebRDS may be a promising method for sampling of Internet-using MSM and other hidden groups. Key words: Respondent-driven sampling, Online sampling, Men who have sex with men, Vietnam, Sexual risk behavio

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Benchmarking API Costs of Network Sampling Strategies

Author: Coscia Michele
Rossi Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/12/2018
Field of study

Crossref

The IT University of Copenhagen's Repository