16,845 research outputs found
Online Social Network Friends and Spatio-temporal Proximity of Their Geotagged Photos – A Case Study of Flickr Data
This empirical study aims to analyze relationships between online social network (OSN) friends and spatio-temporal proximity of their geotagged photos, using Flickr data as a case study. First, this study analyzes whether Flickr friends tend to post geotagged photos that are closer to each other compared to Flickr non-friends in space and time. Second, this study investigates whether the number of geotagged photos posted by users is related to the distance and time difference between their geotagged photos. Third, this study examines the spatial distributions of geotagged photos of Flickr friends within specific distance intervals to further understand the geographic meanings of Flickr user’s geotagging activities. Findings of this study can improve our understanding of the relationship between users’ virtual friendships and their physical activities. These understandings can support future research, including location-based services, location-based OSN searches, and location-based online marketing
Predicting human mobility through the assimilation of social media traces into mobility models
Predicting human mobility flows at different spatial scales is challenged by
the heterogeneity of individual trajectories and the multi-scale nature of
transportation networks. As vast amounts of digital traces of human behaviour
become available, an opportunity arises to improve mobility models by
integrating into them proxy data on mobility collected by a variety of digital
platforms and location-aware services. Here we propose a hybrid model of human
mobility that integrates a large-scale publicly available dataset from a
popular photo-sharing system with the classical gravity model, under a stacked
regression procedure. We validate the performance and generalizability of our
approach using two ground-truth datasets on air travel and daily commuting in
the United States: using two different cross-validation schemes we show that
the hybrid model affords enhanced mobility prediction at both spatial scales.Comment: 17 pages, 10 figure
Level Playing Field for Million Scale Face Recognition
Face recognition has the perception of a solved problem, however when tested
at the million-scale exhibits dramatic variation in accuracies across the
different algorithms. Are the algorithms very different? Is access to good/big
training data their secret weapon? Where should face recognition improve? To
address those questions, we created a benchmark, MF2, that requires all
algorithms to be trained on same data, and tested at the million scale. MF2 is
a public large-scale set with 672K identities and 4.7M photos created with the
goal to level playing field for large scale face recognition. We contrast our
results with findings from the other two large-scale benchmarks MegaFace
Challenge and MS-Celebs-1M where groups were allowed to train on any
private/public/big/small set. Some key discoveries: 1) algorithms, trained on
MF2, were able to achieve state of the art and comparable results to algorithms
trained on massive private sets, 2) some outperformed themselves once trained
on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace,
identifying the need for larger age variations possibly within identities or
adjustment of algorithms in future testings
Folks in Folksonomies: Social Link Prediction from Shared Metadata
Web 2.0 applications have attracted a considerable amount of attention
because their open-ended nature allows users to create light-weight semantic
scaffolding to organize and share content. To date, the interplay of the social
and semantic components of social media has been only partially explored. Here
we focus on Flickr and Last.fm, two social media systems in which we can relate
the tagging activity of the users with an explicit representation of their
social network. We show that a substantial level of local lexical and topical
alignment is observable among users who lie close to each other in the social
network. We introduce a null model that preserves user activity while removing
local correlations, allowing us to disentangle the actual local alignment
between users from statistical effects due to the assortative mixing of user
activity and centrality in the social network. This analysis suggests that
users with similar topical interests are more likely to be friends, and
therefore semantic similarity measures among users based solely on their
annotation metadata should be predictive of social links. We test this
hypothesis on the Last.fm data set, confirming that the social network
constructed from semantic similarity captures actual friendship more accurately
than Last.fm's suggestions based on listening patterns.Comment: http://portal.acm.org/citation.cfm?doid=1718487.171852
Node similarity as a basic principle behind connectivity in complex networks
How are people linked in a highly connected society? Since in many networks a
power-law (scale-free) node-degree distribution can be observed, power-law
might be seen as a universal characteristics of networks. But this study of
communication in the Flickr social online network reveals that power-law
node-degree distributions are restricted to only sparsely connected networks.
More densely connected networks, by contrast, show an increasing divergence
from power-law. This work shows that this observation is consistent with the
classic idea from social sciences that similarity is the driving factor behind
communication in social networks. The strong relation between communication
strength and node similarity could be confirmed by analyzing the Flickr
network. It also is shown that node similarity as a network formation model can
reproduce the characteristics of different network densities and hence can be
used as a model for describing the topological transition from weakly to
strongly connected societies.Comment: 6 pages in Journal of Data Mining & Digital Humanities (2015)
jdmdh:3
Fast Shortest Path Distance Estimation in Large Networks
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.
In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.
We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.
Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship
Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval
We propose a novel algorithm for the task of supervised discriminative
distance learning by nonlinearly embedding vectors into a low dimensional
Euclidean space. We work in the challenging setting where supervision is with
constraints on similar and dissimilar pairs while training. The proposed method
is derived by an approximate kernelization of a linear Mahalanobis-like
distance metric learning algorithm and can also be seen as a kernel neural
network. The number of model parameters and test time evaluation complexity of
the proposed method are O(dD) where D is the dimensionality of the input
features and d is the dimension of the projection space - this is in contrast
to the usual kernelization methods as, unlike them, the complexity does not
scale linearly with the number of training examples. We propose a stochastic
gradient based learning algorithm which makes the method scalable (w.r.t. the
number of training examples), while being nonlinear. We train the method with
up to half a million training pairs of 4096 dimensional CNN features. We give
empirical comparisons with relevant baselines on seven challenging datasets for
the task of low dimensional semantic category based image retrieval.Comment: ICCV 2015 preprin
- …