15 research outputs found

    Exact Age Prediction in Social Networks

    Get PDF
    ABSTRACT Predicting accurate demographic information about the users of information systems is a problem of interest in personalized search, ad targeting, and other related fields. Despite such broad applications, most existing work only considers age prediction as one of classification, typically into only a few broad categories. Here, we consider the problem of exact age prediction in social networks as one of regression. Our proposed method learns social representations which capture community information for use as covariates. In our preliminary experiments on a large real-world social network, it can predict age within 4.15 years on average, strongly outperforming standard network regression techniques when labeled data is sparse

    Learning Edge Representations via Low-Rank Asymmetric Projections

    Full text link
    We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks. Unlike previous work, we (1) explicitly model an edge as a function of node embeddings, and we (2) propose a novel objective, the "graph likelihood", which contrasts information from sampled random walks with non-existent edges. Individually, both of these contributions improve the learned representations, especially when there are memory constraints on the total size of the embeddings. When combined, our contributions enable us to significantly improve the state-of-the-art by learning more concise representations that better preserve the graph structure. We evaluate our method on a variety of link-prediction task including social networks, collaboration networks, and protein interactions, showing that our proposed method learn representations with error reductions of up to 76% and 55%, on directed and undirected graphs. In addition, we show that the representations learned by our method are quite space efficient, producing embeddings which have higher structure-preserving accuracy but are 10 times smaller

    Privacy-Preserving Graph Convolutional Networks for Text Classification

    Full text link
    Graph convolutional networks (GCNs) are a powerful architecture for representation learning on documents that naturally occur as graphs, e.g., citation or social networks. However, sensitive personal information, such as documents with people's profiles or relationships as edges, are prone to privacy leaks, as the trained model might reveal the original input. Although differential privacy (DP) offers a well-founded privacy-preserving framework, GCNs pose theoretical and practical challenges due to their training specifics. We address these challenges by adapting differentially-private gradient-based training to GCNs and conduct experiments using two optimizers on five NLP datasets in two languages. We propose a simple yet efficient method based on random graph splits that not only improves the baseline privacy bounds by a factor of 2.7 while retaining competitive F1 scores, but also provides strong privacy guarantees of epsilon = 1.0. We show that, under certain modeling choices, privacy-preserving GCNs perform up to 90% of their non-private variants, while formally guaranteeing strong privacy measures

    You are how you travel: A multi-task learning framework for Geodemographic inference using transit smart card data

    Get PDF
    Geodemographics, providing the information of population's characteristics in the regions on a geographical basis, is of immense importance in urban studies, public policy-making, social research and business, among others. Such data, however, are difficult to collect from the public, which is usually done via census, with a low update frequency. In urban areas, with the increasing prevalence of public transit equipped with automated fare payment systems, researchers can collect massive transit smart card (SC) data from a large population. The SC data record human daily activities at an individual level with high spatial and temporal resolutions. It can reveal frequent activity areas (e.g., residential areas) and travel behaviours of passengers that are intimately intertwined with personal interests and characteristics. This provides new opportunities for geodemographic study. This paper seeks to develop a framework to infer travellers' demographics (such as age, income level and car ownership, et al.) and their residential areas for geodemographic mapping using SC data with a household survey. We first use a decision tree diagram to detect passengers' residential areas. We then represent each individual's spatio-temporal activity pattern derived from multi-week SC data as a 2D image. Leveraging this representation, a multi-task convolutional neural network (CNN) is employed to predict multiple demographics of individuals from the images. Combing the demographics and locations of their residence, geodemographic information is further obtained. The methodology is applied to a large-scale SC dataset provided by Transport for London. Results provide new insights in understanding the relationship between human activity patterns and demographics. To the best of our knowledge, this is the first attempt to infer geodemographics by using the SC data

    Observing and recommending from a social web with biases

    No full text
    The research question this report addresses is: how, and to what extent, those directly involved with the design, development and employment of a specific black box algorithm can be certain that it is not unlawfully discriminating (directly and/or indirectly) against particular persons with protected characteristics (e.g. gender, race and ethnicity)?Comment: Technical Report, University of Southampton, March 201