Search CORE

438 research outputs found

Modeling the Temporal Nature of Human Behavior for Demographics Prediction

Author: A Wesolowski
C Herrera-Yagüe
E Jahani
L Bengtsson
S Hochreiter
Y Bengio
Y LeCun
Y-A Montjoye de
YA Montjoye de
Publication venue
Publication date: 01/01/2017
Field of study

Mobile phone metadata is increasingly used for humanitarian purposes in developing countries as traditional data is scarce. Basic demographic information is however often absent from mobile phone datasets, limiting the operational impact of the datasets. For these reasons, there has been a growing interest in predicting demographic information from mobile phone metadata. Previous work focused on creating increasingly advanced features to be modeled with standard machine learning algorithms. We here instead model the raw mobile phone metadata directly using deep learning, exploiting the temporal nature of the patterns in the data. From high-level assumptions we design a data representation and convolutional network architecture for modeling patterns within a week. We then examine three strategies for aggregating patterns across weeks and show that our method reaches state-of-the-art accuracy on both age and gender prediction using only the temporal modality in mobile metadata. We finally validate our method on low activity users and evaluate the modeling assumptions.Comment: Accepted at ECML 2017. A previous version of this paper was titled 'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and was accepted at the ICLR 2016 worksho

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

The performance of modularity maximization in practical contexts

Author: Clauset Aaron
de Montjoye Yves-Alexandre
Good Benjamin H.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2010
Field of study

Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood in practical contexts. Here, we present a broad characterization of its performance in such situations. First, we revisit and clarify the resolution limit phenomenon for modularity maximization. Second, we show that the modularity function Q exhibits extreme degeneracies: it typically admits an exponential number of distinct high-scoring solutions and typically lacks a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Q_max for one model of infinitely modular networks, showing that it depends strongly both on the size of the network and on the number of modules it contains. Finally, using three real-world metabolic networks as examples, we show that the degenerate solutions can fundamentally disagree on many, but not all, partition properties such as the composition of the largest modules and the distribution of module sizes. These results imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. They also explain why many heuristics are often successful at finding high-scoring partitions in practice and why different heuristics can disagree on the modular structure of the same network. We conclude by discussing avenues for mitigating some of these behaviors, such as combining information from many degenerate solutions or using generative models.Comment: 20 pages, 14 figures, 6 appendices; code available at http://www.santafe.edu/~aaronc/modularity

arXiv.org e-Print Archive

DIAL UCLouvain

Evolution of Privacy Loss in Wikipedia

Author: Almeida R.
de Montjoye Y.-A.
Gibbons A.
Ramachandran A.
Youyou W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/12/2015
Field of study

The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

When the signal is in the noise: Exploiting Diffix's Sticky Noise

Author: de Montjoye Yves-Alexandre
Gadotti Andrea
Houssiau Florimond
Livshits Benjamin
Rocher Luc
Publication venue
Publication date: 01/01/2019
Field of study

Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the development of privacy-preserving query-based systems. Based on the idea of "sticky noise", Diffix has been recently proposed as a novel query-based mechanism satisfying alone the EU Article~29 Working Party's definition of anonymization. According to its authors, Diffix adds less noise to answers than solutions based on differential privacy while allowing for an unlimited number of queries. This paper presents a new class of noise-exploitation attacks, exploiting the noise added by the system to infer private information about individuals in the dataset. Our first differential attack uses samples extracted from Diffix in a likelihood ratio test to discriminate between two probability distributions. We show that using this attack against a synthetic best-case dataset allows us to infer private information with 89.4% accuracy using only 5 attributes. Our second cloning attack uses dummy conditions that conditionally strongly affect the output of the query depending on the value of the private attribute. Using this attack on four real-world datasets, we show that we can infer private attributes of at least 93% of the users in the dataset with accuracy between 93.3% and 97.1%, issuing a median of 304 queries per user. We show how to optimize this attack, targeting 55.4% of the users and achieving 91.7% accuracy, using a maximum of only 32 queries per user. Our attacks demonstrate that adding data-dependent noise, as done by Diffix, is not sufficient to prevent inference of private attributes. We furthermore argue that Diffix alone fails to satisfy Art. 29 WP's definition of anonymization. [...

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

DIAL UCLouvain

Moirans – Église Saint-Pierre

Author: Badin de Montjoye Alain
Badin de Montjoye Alain
Publication venue: ADLFI. Archéologie de la France - Informations
Publication date: 21/10/2013
Field of study

Date de l'opération : 2007 (FP) La fouille de l’ancienne église Saint-Pierre de Moirans est entrée en 2007 dans sa seconde année d’autorisation pluriannuelle, valant pour la période 2006–2008. Avec une surface étendue à la totalité du collatéral nord de la nef, la chapelle qui le prolonge à l’est, l’abside et la travée de chœur, les données produites constituent des avancées importantes dans la connaissance du site. La fouille des sépultures de la période moderne (XVIIe s. - premier quart du ..

OpenEdition

Moirans – Ancienne église Saint-Pierre

Author: Badin de Montjoye Alain
Badin de Montjoye Alain
Publication venue: ADLFI. Archéologie de la France - Informations
Publication date: 30/07/2015
Field of study

La fouille de l’ancienne église Saint-Pierre de Moirans, autorisée en 2011 pour une période de trois ans, a permis des avancées notables dans la compréhension des différentes périodes d’occupation du site, des phases de construction de l’église médiévale et des dépôts funéraires qu’elle abrite ou qui l’ont précédée. Aux trois sarcophages du haut Moyen Âge mis au jour dans la travée de chœur (secteur VII), s’ajoute un quatrième, trapézoïdal et monolithe en tuf, de direction nord-sud, lequel a ..

OpenEdition

Montbonnot-Saint-Martin – Ancien prieuré de Saint-Martin-de-Miseré

Author: Badin de Montjoye Alain
Badin de Montjoye Alain
Publication venue: ADLFI. Archéologie de la France - Informations
Publication date: 21/10/2013
Field of study

Identifiant de l'opération archéologique : 229335 Date de l'opération : 2007 (SU) Des travaux urgents d’endiguement d’un torrent, aux crues saisonnières dévastatrices, la Doux, ont été entrepris à la fin de l’année 2006 par la commune, aux abords du site de l’ancien prieuré de Saint-Martin-de-Miserere. Fondé dans les dernières années du XIe s. par l’évêque de Grenoble, Saint-Hugues, et l’une des pièces maîtresses du dispositif de réforme religieuse mis en place par ce prélat, le prieuré de ch..

OpenEdition

When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy

Author: A Wasef
C Dwork
H Shin
L Sweeney
N Sadeh
SL Warner
Y Koren
YA Montjoye De
Z Huo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/04/2019
Field of study

In recent years, it has become easy to obtain location information quite precisely. However, the acquisition of such information has risks such as individual identification and leakage of sensitive information, so it is necessary to protect the privacy of location information. For this purpose, people should know their location privacy preferences, that is, whether or not he/she can release location information at each place and time. However, it is not easy for each user to make such decisions and it is troublesome to set the privacy preference at each time. Therefore, we propose a method to recommend location privacy preferences for decision making. Comparing to existing method, our method can improve the accuracy of recommendation by using matrix factorization and preserve privacy strictly by local differential privacy, whereas the existing method does not achieve formal privacy guarantee. In addition, we found the best granularity of a location privacy preference, that is, how to express the information in location privacy protection. To evaluate and verify the utility of our method, we have integrated two existing datasets to create a rich information in term of user number. From the results of the evaluation using this dataset, we confirmed that our method can predict location privacy preferences accurately and that it provides a suitable method to define the location privacy preference

arXiv.org e-Print Archive

Crossref