16,845 research outputs found

    Online Social Network Friends and Spatio-temporal Proximity of Their Geotagged Photos – A Case Study of Flickr Data

    Get PDF
    This empirical study aims to analyze relationships between online social network (OSN) friends and spatio-temporal proximity of their geotagged photos, using Flickr data as a case study. First, this study analyzes whether Flickr friends tend to post geotagged photos that are closer to each other compared to Flickr non-friends in space and time. Second, this study investigates whether the number of geotagged photos posted by users is related to the distance and time difference between their geotagged photos. Third, this study examines the spatial distributions of geotagged photos of Flickr friends within specific distance intervals to further understand the geographic meanings of Flickr user’s geotagging activities. Findings of this study can improve our understanding of the relationship between users’ virtual friendships and their physical activities. These understandings can support future research, including location-based services, location-based OSN searches, and location-based online marketing

    Predicting human mobility through the assimilation of social media traces into mobility models

    Get PDF
    Predicting human mobility flows at different spatial scales is challenged by the heterogeneity of individual trajectories and the multi-scale nature of transportation networks. As vast amounts of digital traces of human behaviour become available, an opportunity arises to improve mobility models by integrating into them proxy data on mobility collected by a variety of digital platforms and location-aware services. Here we propose a hybrid model of human mobility that integrates a large-scale publicly available dataset from a popular photo-sharing system with the classical gravity model, under a stacked regression procedure. We validate the performance and generalizability of our approach using two ground-truth datasets on air travel and daily commuting in the United States: using two different cross-validation schemes we show that the hybrid model affords enhanced mobility prediction at both spatial scales.Comment: 17 pages, 10 figure

    Level Playing Field for Million Scale Face Recognition

    Full text link
    Face recognition has the perception of a solved problem, however when tested at the million-scale exhibits dramatic variation in accuracies across the different algorithms. Are the algorithms very different? Is access to good/big training data their secret weapon? Where should face recognition improve? To address those questions, we created a benchmark, MF2, that requires all algorithms to be trained on same data, and tested at the million scale. MF2 is a public large-scale set with 672K identities and 4.7M photos created with the goal to level playing field for large scale face recognition. We contrast our results with findings from the other two large-scale benchmarks MegaFace Challenge and MS-Celebs-1M where groups were allowed to train on any private/public/big/small set. Some key discoveries: 1) algorithms, trained on MF2, were able to achieve state of the art and comparable results to algorithms trained on massive private sets, 2) some outperformed themselves once trained on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace, identifying the need for larger age variations possibly within identities or adjustment of algorithms in future testings

    Folks in Folksonomies: Social Link Prediction from Shared Metadata

    Full text link
    Web 2.0 applications have attracted a considerable amount of attention because their open-ended nature allows users to create light-weight semantic scaffolding to organize and share content. To date, the interplay of the social and semantic components of social media has been only partially explored. Here we focus on Flickr and Last.fm, two social media systems in which we can relate the tagging activity of the users with an explicit representation of their social network. We show that a substantial level of local lexical and topical alignment is observable among users who lie close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local alignment between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. This analysis suggests that users with similar topical interests are more likely to be friends, and therefore semantic similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this hypothesis on the Last.fm data set, confirming that the social network constructed from semantic similarity captures actual friendship more accurately than Last.fm's suggestions based on listening patterns.Comment: http://portal.acm.org/citation.cfm?doid=1718487.171852

    Node similarity as a basic principle behind connectivity in complex networks

    Full text link
    How are people linked in a highly connected society? Since in many networks a power-law (scale-free) node-degree distribution can be observed, power-law might be seen as a universal characteristics of networks. But this study of communication in the Flickr social online network reveals that power-law node-degree distributions are restricted to only sparsely connected networks. More densely connected networks, by contrast, show an increasing divergence from power-law. This work shows that this observation is consistent with the classic idea from social sciences that similarity is the driving factor behind communication in social networks. The strong relation between communication strength and node similarity could be confirmed by analyzing the Flickr network. It also is shown that node similarity as a network formation model can reproduce the characteristics of different network densities and hence can be used as a model for describing the topological transition from weakly to strongly connected societies.Comment: 6 pages in Journal of Data Mining & Digital Humanities (2015) jdmdh:3

    Fast Shortest Path Distance Estimation in Large Networks

    Full text link
    We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship

    Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval

    Full text link
    We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. We work in the challenging setting where supervision is with constraints on similar and dissimilar pairs while training. The proposed method is derived by an approximate kernelization of a linear Mahalanobis-like distance metric learning algorithm and can also be seen as a kernel neural network. The number of model parameters and test time evaluation complexity of the proposed method are O(dD) where D is the dimensionality of the input features and d is the dimension of the projection space - this is in contrast to the usual kernelization methods as, unlike them, the complexity does not scale linearly with the number of training examples. We propose a stochastic gradient based learning algorithm which makes the method scalable (w.r.t. the number of training examples), while being nonlinear. We train the method with up to half a million training pairs of 4096 dimensional CNN features. We give empirical comparisons with relevant baselines on seven challenging datasets for the task of low dimensional semantic category based image retrieval.Comment: ICCV 2015 preprin
    • …
    corecore