Search CORE

84,853 research outputs found

Network Model Selection for Task-Focused Attributed Network Inference

Author: Berger-Wolf Tanya Y.
Brugere Ivan
Kanich Chris
Publication venue
Publication date: 16/09/2017
Field of study

Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments

arXiv.org e-Print Archive

Crossref

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Author: Allen David
Compton Ryan
Jurgens David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Exact detection of direct links in networks of interacting dynamical units

Author: Baptista Murilo Da Silva
Bianco-Martinez Ezequiel
Grebogi Celso
Marti Arturo C
Masoller Cristina
Rubido Obrer Nicolas
Publication venue: 'IOP Publishing'
Publication date: 01/09/2014
Field of study

Authors NR, EB-M, CG, and MSB acknowledge the Scottish Universities Physics Alliance (SUPA). EB-M and MSB also acknowledge the Engineering and Physical Science Research Council (EPSRC) project Ref. EP/I032 606/1. ACM and CM acknowledge the LINC project (FP7-PEOPLE-2011-ITN, grant no. 289447). ACM also aknowledges PEDECIBA and CSIC(Uruguay). CM also acknowledges grant FIS2012–37655-C02–01 from the Spanish MCI, grant 2009 SGR 1168, and the ICREA Academia programme from the Generalitat de Catalunya.Peer reviewedPublisher PD

arXiv.org e-Print Archive

Aberdeen University Research

UPCommons. Portal del coneixement obert de la UPC

Network Model Selection Using Task-Focused Minimum Description Length

Author: Berger-Wolf Tanya Y.
Brugere Ivan
Publication venue
Publication date: 01/01/2018
Field of study

Networks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or their impacts. We propose a task-focused network model selection methodology which addresses several key challenges. Our approach constructs network models from underlying data and uses minimum description length (MDL) criteria for selection. Our methodology measures efficiency, a general and comparable measure of the network's performance of a local (i.e. node-level) predictive task of interest. Selection on efficiency favors parsimonious (e.g. sparse) models to avoid overfitting and can be applied across arbitrary tasks and representations. We show stability, sensitivity, and significance testing in our methodology

arXiv.org e-Print Archive

Crossref