437 research outputs found
Modeling the Temporal Nature of Human Behavior for Demographics Prediction
Mobile phone metadata is increasingly used for humanitarian purposes in
developing countries as traditional data is scarce. Basic demographic
information is however often absent from mobile phone datasets, limiting the
operational impact of the datasets. For these reasons, there has been a growing
interest in predicting demographic information from mobile phone metadata.
Previous work focused on creating increasingly advanced features to be modeled
with standard machine learning algorithms. We here instead model the raw mobile
phone metadata directly using deep learning, exploiting the temporal nature of
the patterns in the data. From high-level assumptions we design a data
representation and convolutional network architecture for modeling patterns
within a week. We then examine three strategies for aggregating patterns across
weeks and show that our method reaches state-of-the-art accuracy on both age
and gender prediction using only the temporal modality in mobile metadata. We
finally validate our method on low activity users and evaluate the modeling
assumptions.Comment: Accepted at ECML 2017. A previous version of this paper was titled
'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and
was accepted at the ICLR 2016 worksho
The performance of modularity maximization in practical contexts
Although widely used in practice, the behavior and accuracy of the popular
module identification technique called modularity maximization is not well
understood in practical contexts. Here, we present a broad characterization of
its performance in such situations. First, we revisit and clarify the
resolution limit phenomenon for modularity maximization. Second, we show that
the modularity function Q exhibits extreme degeneracies: it typically admits an
exponential number of distinct high-scoring solutions and typically lacks a
clear global maximum. Third, we derive the limiting behavior of the maximum
modularity Q_max for one model of infinitely modular networks, showing that it
depends strongly both on the size of the network and on the number of modules
it contains. Finally, using three real-world metabolic networks as examples, we
show that the degenerate solutions can fundamentally disagree on many, but not
all, partition properties such as the composition of the largest modules and
the distribution of module sizes. These results imply that the output of any
modularity maximization procedure should be interpreted cautiously in
scientific contexts. They also explain why many heuristics are often successful
at finding high-scoring partitions in practice and why different heuristics can
disagree on the modular structure of the same network. We conclude by
discussing avenues for mitigating some of these behaviors, such as combining
information from many degenerate solutions or using generative models.Comment: 20 pages, 14 figures, 6 appendices; code available at
http://www.santafe.edu/~aaronc/modularity
Evolution of Privacy Loss in Wikipedia
The cumulative effect of collective online participation has an important and
adverse impact on individual privacy. As an online system evolves over time,
new digital traces of individual behavior may uncover previously hidden
statistical links between an individual's past actions and her private traits.
To quantify this effect, we analyze the evolution of individual privacy loss by
studying the edit history of Wikipedia over 13 years, including more than
117,523 different users performing 188,805,088 edits. We trace each Wikipedia's
contributor using apparently harmless features, such as the number of edits
performed on predefined broad categories in a given time period (e.g.
Mathematics, Culture or Nature). We show that even at this unspecific level of
behavior description, it is possible to use off-the-shelf machine learning
algorithms to uncover usually undisclosed personal traits, such as gender,
religion or education. We provide empirical evidence that the prediction
accuracy for almost all private traits consistently improves over time.
Surprisingly, the prediction performance for users who stopped editing after a
given time still improves. The activities performed by new users seem to have
contributed more to this effect than additional activities from existing (but
still active) users. Insights from this work should help users, system
designers, and policy makers understand and make long-term design choices in
online content creation systems
When the signal is in the noise: Exploiting Diffix's Sticky Noise
Anonymized data is highly valuable to both businesses and researchers. A
large body of research has however shown the strong limits of the
de-identification release-and-forget model, where data is anonymized and
shared. This has led to the development of privacy-preserving query-based
systems. Based on the idea of "sticky noise", Diffix has been recently proposed
as a novel query-based mechanism satisfying alone the EU Article~29 Working
Party's definition of anonymization. According to its authors, Diffix adds less
noise to answers than solutions based on differential privacy while allowing
for an unlimited number of queries.
This paper presents a new class of noise-exploitation attacks, exploiting the
noise added by the system to infer private information about individuals in the
dataset. Our first differential attack uses samples extracted from Diffix in a
likelihood ratio test to discriminate between two probability distributions. We
show that using this attack against a synthetic best-case dataset allows us to
infer private information with 89.4% accuracy using only 5 attributes. Our
second cloning attack uses dummy conditions that conditionally strongly affect
the output of the query depending on the value of the private attribute. Using
this attack on four real-world datasets, we show that we can infer private
attributes of at least 93% of the users in the dataset with accuracy between
93.3% and 97.1%, issuing a median of 304 queries per user. We show how to
optimize this attack, targeting 55.4% of the users and achieving 91.7%
accuracy, using a maximum of only 32 queries per user.
Our attacks demonstrate that adding data-dependent noise, as done by Diffix,
is not sufficient to prevent inference of private attributes. We furthermore
argue that Diffix alone fails to satisfy Art. 29 WP's definition of
anonymization. [...
Moirans – Église Saint-Pierre
Date de l'opération : 2007 (FP) La fouille de l’ancienne église Saint-Pierre de Moirans est entrée en 2007 dans sa seconde année d’autorisation pluriannuelle, valant pour la période 2006–2008. Avec une surface étendue à la totalité du collatéral nord de la nef, la chapelle qui le prolonge à l’est, l’abside et la travée de chœur, les données produites constituent des avancées importantes dans la connaissance du site. La fouille des sépultures de la période moderne (XVIIe s. - premier quart du ..
Moirans – Ancienne église Saint-Pierre
La fouille de l’ancienne église Saint-Pierre de Moirans, autorisée en 2011 pour une période de trois ans, a permis des avancées notables dans la compréhension des différentes périodes d’occupation du site, des phases de construction de l’église médiévale et des dépôts funéraires qu’elle abrite ou qui l’ont précédée. Aux trois sarcophages du haut Moyen Âge mis au jour dans la travée de chœur (secteur VII), s’ajoute un quatrième, trapézoïdal et monolithe en tuf, de direction nord-sud, lequel a ..
Montbonnot-Saint-Martin – Ancien prieuré de Saint-Martin-de-Miseré
Identifiant de l'opération archéologique : 229335 Date de l'opération : 2007 (SU) Des travaux urgents d’endiguement d’un torrent, aux crues saisonnières dévastatrices, la Doux, ont été entrepris à la fin de l’année 2006 par la commune, aux abords du site de l’ancien prieuré de Saint-Martin-de-Miserere. Fondé dans les dernières années du XIe s. par l’évêque de Grenoble, Saint-Hugues, et l’une des pièces maîtresses du dispositif de réforme religieuse mis en place par ce prélat, le prieuré de ch..
When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy
In recent years, it has become easy to obtain location information quite
precisely. However, the acquisition of such information has risks such as
individual identification and leakage of sensitive information, so it is
necessary to protect the privacy of location information. For this purpose,
people should know their location privacy preferences, that is, whether or not
he/she can release location information at each place and time. However, it is
not easy for each user to make such decisions and it is troublesome to set the
privacy preference at each time. Therefore, we propose a method to recommend
location privacy preferences for decision making. Comparing to existing method,
our method can improve the accuracy of recommendation by using matrix
factorization and preserve privacy strictly by local differential privacy,
whereas the existing method does not achieve formal privacy guarantee. In
addition, we found the best granularity of a location privacy preference, that
is, how to express the information in location privacy protection. To evaluate
and verify the utility of our method, we have integrated two existing datasets
to create a rich information in term of user number. From the results of the
evaluation using this dataset, we confirmed that our method can predict
location privacy preferences accurately and that it provides a suitable method
to define the location privacy preference
- …