84,853 research outputs found
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Geographically annotated social media is extremely valuable for modern
information retrieval. However, when researchers can only access
publicly-visible data, one quickly finds that social media users rarely publish
location information. In this work, we provide a method which can geolocate the
overwhelming majority of active Twitter users, independent of their location
sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's
locations. We frame the geotagging problem as an optimization over a social
network with a total variation-based objective and provide a scalable and
distributed algorithm for its solution. Furthermore, we show how a robust
estimate of the geographic dispersion of each user's ego network can be used as
a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for
101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag
over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan,
David Jurgens, and David Allen. "Geotagging one hundred million twitter
accounts with total variation minimization." Big Data (Big Data), 2014 IEEE
International Conference on. IEEE, 201
Exact detection of direct links in networks of interacting dynamical units
Authors NR, EB-M, CG, and MSB acknowledge the Scottish Universities Physics Alliance (SUPA). EB-M and MSB also acknowledge the Engineering and Physical Science Research Council (EPSRC) project Ref. EP/I032 606/1. ACM and CM acknowledge the LINC project (FP7-PEOPLE-2011-ITN, grant no. 289447). ACM also aknowledges PEDECIBA and CSIC(Uruguay). CM also acknowledges grant FIS2012–37655-C02–01 from the Spanish MCI, grant 2009 SGR 1168, and the ICREA Academia programme from the Generalitat de Catalunya.Peer reviewedPublisher PD
Network Model Selection Using Task-Focused Minimum Description Length
Networks are fundamental models for data used in practically every
application domain. In most instances, several implicit or explicit choices
about the network definition impact the translation of underlying data to a
network representation, and the subsequent question(s) about the underlying
system being represented. Users of downstream network data may not even be
aware of these choices or their impacts. We propose a task-focused network
model selection methodology which addresses several key challenges. Our
approach constructs network models from underlying data and uses minimum
description length (MDL) criteria for selection. Our methodology measures
efficiency, a general and comparable measure of the network's performance of a
local (i.e. node-level) predictive task of interest. Selection on efficiency
favors parsimonious (e.g. sparse) models to avoid overfitting and can be
applied across arbitrary tasks and representations. We show stability,
sensitivity, and significance testing in our methodology
- …