1,137 research outputs found
Modeling, Predicting and Capturing Human Mobility
Realistic models of human mobility are critical for modern day applications, specifically for recommendation systems, resource planning and process optimization domains. Given the rapid proliferation of mobile devices equipped with Internet connectivity and GPS functionality today, aggregating large sums of individual geolocation data is feasible. The thesis focuses on methodologies to facilitate data-driven mobility modeling by drawing parallels between the inherent nature of mobility trajectories, statistical physics and information theory. On the applied side, the thesis contributions lie in leveraging the formulated mobility models to construct prediction workflows by adopting a privacy-by-design perspective. This enables end users to derive utility from location-based services while preserving their location privacy. Finally, the thesis presents several approaches to generate large-scale synthetic mobility datasets by applying machine learning approaches to facilitate experimental reproducibility
DeepMotions : A Deep Learning System for Path Prediction Using Similar Motions
Trajectory prediction techniques play a serious role in many location-based services such as mobile advertising, carpooling, taxi services, traffic management, and routing services. These techniques rely on the object’s motion history to predict the future path(s). As a consequence, these techniques fail when history is unavailable. The unavailability of history might occur for several reasons such as; history might be inaccessible, a recently registered user with no preceding history, or previously logged data is preserved for confidentiality and privacy. This paper presents a Bi-directional recurrent deep-learning based prediction system, named DeepMotions , to predict the future path of a query object without any prior knowledge of the object historical motions. The main idea of DeepMotions is to observe the moving objects in the vicinity that have similar motion patterns of the query object. Then use those similar objects to train and predict the query object’s future steps. To compute similarity, we propose a similarity function that is based on the KNN algorithm. Extensive experiments conducted on real data sets confirm the efficient performance and the quality of prediction in DeepMotions with up to 96% accuracy
The audio auditor: user-level membership inference in Internet of Things voice services
With the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy
Predicting Evacuation Decisions using Representations of Individuals' Pre-Disaster Web Search Behavior
Predicting the evacuation decisions of individuals before the disaster
strikes is crucial for planning first response strategies. In addition to the
studies on post-disaster analysis of evacuation behavior, there are various
works that attempt to predict the evacuation decisions beforehand. Most of
these predictive methods, however, require real time location data for
calibration, which are becoming much harder to obtain due to the rising privacy
concerns. Meanwhile, web search queries of anonymous users have been collected
by web companies. Although such data raise less privacy concerns, they have
been under-utilized for various applications. In this study, we investigate
whether web search data observed prior to the disaster can be used to predict
the evacuation decisions. More specifically, we utilize a "session-based query
encoder" that learns the representations of each user's web search behavior
prior to evacuation. Our proposed approach is empirically tested using web
search data collected from users affected by a major flood in Japan. Results
are validated using location data collected from mobile phones of the same set
of users as ground truth. We show that evacuation decisions can be accurately
predicted (84%) using only the users' pre-disaster web search data as input.
This study proposes an alternative method for evacuation prediction that does
not require highly sensitive location data, which can assist local governments
to prepare effective first response strategies.Comment: Accepted in ACM KDD 201
Personalized Dialogue Generation with Diversified Traits
Endowing a dialogue system with particular personality traits is essential to
deliver more human-like conversations. However, due to the challenge of
embodying personality via language expression and the lack of large-scale
persona-labeled dialogue data, this research problem is still far from
well-studied. In this paper, we investigate the problem of incorporating
explicit personality traits in dialogue generation to deliver personalized
dialogues.
To this end, firstly, we construct PersonalDialog, a large-scale multi-turn
dialogue dataset containing various traits from a large number of speakers. The
dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers.
Each utterance is associated with a speaker who is marked with traits like Age,
Gender, Location, Interest Tags, etc. Several anonymization schemes are
designed to protect the privacy of each speaker. This large-scale dataset will
facilitate not only the study of personalized dialogue generation, but also
other researches on sociolinguistics or social science.
Secondly, to study how personality traits can be captured and addressed in
dialogue generation, we propose persona-aware dialogue generation models within
the sequence to sequence learning framework. Explicit personality traits
(structured by key-value pairs) are embedded using a trait fusion module.
During the decoding process, two techniques, namely persona-aware attention and
persona-aware bias, are devised to capture and address trait-related
information. Experiments demonstrate that our model is able to address proper
traits in different contexts. Case studies also show interesting results for
this challenging research problem.Comment: Please contact [zhengyinhe1 at 163 dot com] for the PersonalDialog
datase
Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation
As location-based services (LBS) have grown in popularity, the collection of
human mobility data has become increasingly extensive to build machine learning
(ML) models offering enhanced convenience to LBS users. However, the
convenience comes with the risk of privacy leakage since this type of data
might contain sensitive information related to user identities, such as
home/work locations. Prior work focuses on protecting mobility data privacy
during transmission or prior to release, lacking the privacy risk evaluation of
mobility data-based ML models. To better understand and quantify the privacy
leakage in mobility data-based ML models, we design a privacy attack suite
containing data extraction and membership inference attacks tailored for
point-of-interest (POI) recommendation models, one of the most widely used
mobility data-based ML models. These attacks in our attack suite assume
different adversary knowledge and aim to extract different types of sensitive
information from mobility data, providing a holistic privacy risk assessment
for POI recommendation models. Our experimental evaluation using two real-world
mobility datasets demonstrates that current POI recommendation models are
vulnerable to our attacks. We also present unique findings to understand what
types of mobility data are more susceptible to privacy attacks. Finally, we
evaluate defenses against these attacks and highlight future directions and
challenges.Comment: 26 page
- …