2,459 research outputs found
Of Wines and Reviews: Measuring and Modeling the Vivino Wine Social Network
This paper presents an analysis of social experiences around wine consumption
through the lens of Vivino, a social network for wine enthusiasts with over 26
million users worldwide. We compare users' perceptions of various wine types
and regional styles across both New and Old World wines, examining them across
price ranges, vintages, regions, varietals, and blends. Among other things, we
find that ratings provided by Vivino users are not biased by cost. We then
study how wine characteristics, language in wine reviews, and the distribution
of wine ratings can be combined to develop prediction models. More
specifically, we model user behavior to develop a regression model for
predicting wine ratings, and a classifier for determining user review
preferences.Comment: A preliminary version of this paper appears in the Proceedings of the
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2018). This is the full versio
Controlled Data Sharing for Collaborative Predictive Blacklisting
Although sharing data across organizations is often advocated as a promising
way to enhance cybersecurity, collaborative initiatives are rarely put into
practice owing to confidentiality, trust, and liability challenges. In this
paper, we investigate whether collaborative threat mitigation can be realized
via a controlled data sharing approach, whereby organizations make informed
decisions as to whether or not, and how much, to share. Using appropriate
cryptographic tools, entities can estimate the benefits of collaboration and
agree on what to share in a privacy-preserving way, without having to disclose
their datasets. We focus on collaborative predictive blacklisting, i.e.,
forecasting attack sources based on one's logs and those contributed by other
organizations. We study the impact of different sharing strategies by
experimenting on a real-world dataset of two billion suspicious IP addresses
collected from Dshield over two months. We find that controlled data sharing
yields up to 105% accuracy improvement on average, while also reducing the
false positive rate.Comment: A preliminary version of this paper appears in DIMVA 2015. This is
the full version. arXiv admin note: substantial text overlap with
arXiv:1403.212
Measuring Membership Privacy on Aggregate Location Time-Series
While location data is extremely valuable for various applications,
disclosing it prompts serious threats to individuals' privacy. To limit such
concerns, organizations often provide analysts with aggregate time-series that
indicate, e.g., how many people are in a location at a time interval, rather
than raw individual traces. In this paper, we perform a measurement study to
understand Membership Inference Attacks (MIAs) on aggregate location
time-series, where an adversary tries to infer whether a specific user
contributed to the aggregates.
We find that the volume of contributed data, as well as the regularity and
particularity of users' mobility patterns, play a crucial role in the attack's
success. We experiment with a wide range of defenses based on generalization,
hiding, and perturbation, and evaluate their ability to thwart the attack
vis-a-vis the utility loss they introduce for various mobility analytics tasks.
Our results show that some defenses fail across the board, while others work
for specific tasks on aggregate location time-series. For instance, suppressing
small counts can be used for ranking hotspots, data generalization for
forecasting traffic, hotspot discovery, and map inference, while sampling is
effective for location labeling and anomaly detection when the dataset is
sparse. Differentially private techniques provide reasonable accuracy only in
very specific settings, e.g., discovering hotspots and forecasting their
traffic, and more so when using weaker privacy notions like crowd-blending
privacy. Overall, our measurements show that there does not exist a unique
generic defense that can preserve the utility of the analytics for arbitrary
applications, and provide useful insights regarding the disclosure of sanitized
aggregate location time-series
On Collaborative Predictive Blacklisting
Collaborative predictive blacklisting (CPB) allows to forecast future attack
sources based on logs and alerts contributed by multiple organizations.
Unfortunately, however, research on CPB has only focused on increasing the
number of predicted attacks but has not considered the impact on false
positives and false negatives. Moreover, sharing alerts is often hindered by
confidentiality, trust, and liability issues, which motivates the need for
privacy-preserving approaches to the problem. In this paper, we present a
measurement study of state-of-the-art CPB techniques, aiming to shed light on
the actual impact of collaboration. To this end, we reproduce and measure two
systems: a non privacy-friendly one that uses a trusted coordinating party with
access to all alerts (Soldo et al., 2010) and a peer-to-peer one using
privacy-preserving data sharing (Freudiger et al., 2015). We show that, while
collaboration boosts the number of predicted attacks, it also yields high false
positives, ultimately leading to poor accuracy. This motivates us to present a
hybrid approach, using a semi-trusted central entity, aiming to increase
utility from collaboration while, at the same time, limiting information
disclosure and false positives. This leads to a better trade-off of true and
false positive rates, while at the same time addressing privacy concerns.Comment: A preliminary version of this paper appears in ACM SIGCOMM's Computer
Communication Review (Volume 48 Issue 5, October 2018). This is the full
versio
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective
Rapid advances in human genomics are enabling researchers to gain a better
understanding of the role of the genome in our health and well-being,
stimulating hope for more effective and cost efficient healthcare. However,
this also prompts a number of security and privacy concerns stemming from the
distinctive characteristics of genomic data. To address them, a new research
community has emerged and produced a large number of publications and
initiatives.
In this paper, we rely on a structured methodology to contextualize and
provide a critical analysis of the current knowledge on privacy-enhancing
technologies used for testing, storing, and sharing genomic data, using a
representative sample of the work published in the past decade. We identify and
discuss limitations, technical challenges, and issues faced by the community,
focusing in particular on those that are inherently tied to the nature of the
problem and are harder for the community alone to address. Finally, we report
on the importance and difficulty of the identified challenges based on an
online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies
(PoPETs), Vol. 2019, Issue
Privacy-Friendly Mobility Analytics using Aggregate Location Data
Location data can be extremely useful to study commuting patterns and
disruptions, as well as to predict real-time traffic volumes. At the same time,
however, the fine-grained collection of user locations raises serious privacy
concerns, as this can reveal sensitive information about the users, such as,
life style, political and religious inclinations, or even identities. In this
paper, we study the feasibility of crowd-sourced mobility analytics over
aggregate location information: users periodically report their location, using
a privacy-preserving aggregation protocol, so that the server can only recover
aggregates -- i.e., how many, but not which, users are in a region at a given
time. We experiment with real-world mobility datasets obtained from the
Transport For London authority and the San Francisco Cabs network, and present
a novel methodology based on time series modeling that is geared to forecast
traffic volumes in regions of interest and to detect mobility anomalies in
them. In the presence of anomalies, we also make enhanced traffic volume
predictions by feeding our model with additional information from correlated
regions. Finally, we present and evaluate a mobile app prototype, called
Mobility Data Donors (MDD), in terms of computation, communication, and energy
overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201
- …