860 research outputs found

    Latent Space Model for Multi-Modal Social Data

    Full text link
    With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data.Comment: 12 pages, 7 figures, 2 table

    Investigating social media spatiotemporal transferability for transport

    Get PDF
    Social Media have increasingly provided data about the movement of people in cities making them useful in understanding the daily life of people in different geographies. Particularly useful for travel analysis is when Social Media users allow (voluntarily or not) tracing their movement using geotagged information of their communication with these online platforms. In this paper we use geotagged tweets from 10 cities in the European Union and United States of America to extract spatiotemporal patterns, study differences and commonalities among these cities, and explore the nature of user location recurrence. The analysis here shows the distinction between residents and tourists is fundamental for the development of city-wide models. Identification of repeated rates of location (recurrence) can be used to define activity spaces. Differences and similarities across different geographies emerge from this analysis in terms of local distributions but also in terms of the worldwide reach among the cities explored here. The comparison of the temporal signature between geotagged and non-geotagged tweets also shows similar temporal distributions that capture in essence city rhythms of tweets and activity spaces

    Detecting country of residence from social media data : a comparison of methods

    Get PDF
    Identifying users' place of residence is an important step in many social media analysis workflows. Various techniques for detecting home locations from social media data have been proposed, but their reliability has rarely been validated using ground truth data. In this article, we compared commonly used spatial and Spatio-temporal methods to determine social media users' country of residence. We applied diverse methods to a global data set of publicly shared geo-located Instagram posts from visitors to the Kruger National Park in South Africa. We evaluated the performance of each method using both individual-level expert assessment for a sample of users and aggregate-level official visitor statistics. Based on the individual-level assessment, a simple Spatio-temporal approach was the best-performed for detecting the country of residence. Results show why aggregate-level official statistics are not the best indicators for evaluating method performance. We also show how social media usage, such as the number of countries visited and posting activity over time, affect the performance of methods. In addition to a methodological contribution, this work contributes to the discussion about spatial and temporal biases in mobile big data.Peer reviewe
    • …
    corecore