122 research outputs found
Leveraging Client Processing for Location Privacy in Mobile Local Search
Usage of mobile services is growing rapidly. Most Internet-based services targeted for PC based browsers now have mobile counterparts. These mobile counterparts often are enhanced when they use user\u27s location as one of the inputs. Even some PC-based services such as point of interest Search, Mapping, Airline tickets, and software download mirrors now use user\u27s location in order to enhance their services. Location-based services are exactly these, that take the user\u27s location as an input and enhance the experience based on that. With increased use of these services comes the increased risk to location privacy. The location is considered an attribute that user\u27s hold as important to their privacy. Compromise of one\u27s location, in other words, loss of location privacy can have several detrimental effects on the user ranging from trivial annoyance to unreasonable persecution.
More and more companies in the Internet economy rely exclusively on the huge data sets they collect about users. The more detailed and accurate the data a company has about its users, the more valuable the company is considered. No wonder that these companies are often the same companies that offer these services for free. This gives them an opportunity to collect more accurate location information. Research community in the location privacy protection area had to reciprocate by modeling an adversary that could be the service provider itself. To further drive this point, we show that a well-equipped service provider can infer user\u27s location even if the location information is not directly available by using other information he collects about the user.
There is no dearth of proposals of several protocols and algorithms that protect location privacy. A lot of these earlier proposals require a trusted third party to play as an intermediary between the service provider and the user. These protocols use anonymization and/or obfuscation techniques to protect user\u27s identity and/or location. This requirement of trusted third parties comes with its own complications and risks and makes these proposals impractical in real life scenarios. Thus it is preferable that protocols do not require a trusted third party.
We look at existing proposals in the area of private information retrieval. We present a brief survey of several proposals in the literature and implement two representative algorithms. We run experiments using different sizes of databases to ascertain their practicability and performance features. We show that private information retrieval based protocols still have long ways to go before they become practical enough for local search applications.
We propose location privacy preserving mechanisms that take advantage of the processing power of modern mobile devices and provide configurable levels of location privacy. We propose these techniques both in the single query scenario and multiple query scenario. In single query scenario, the user issues a query to the server and obtains the answer. In the multiple query scenario, the user keeps sending queries as she moves about in the area of interest. We show that the multiple query scenario increases the accuracy of adversary\u27s determination of user\u27s location, and hence improvements are needed to cope with this situation. So, we propose an extension of the single query scenario that addresses this riskier multiple query scenario, still maintaining the practicability and acceptable performance when implemented on a modern mobile device. Later we propose a technique based on differential privacy that is inspired by differential privacy in statistical databases. All three mechanisms proposed by us are implemented in realistic hardware or simulators, run against simulated but real life data and their characteristics ascertained to show that they are practical and ready for adaptation.
This dissertation study the privacy issues for location-based services in mobile environment and proposes a set of new techniques that eliminate the need for a trusted third party by implementing efficient algorithms on modern mobile hardware
Indexing Uncertain Categorical Data over Distributed Environment
International audienceToday, a large amount of uncertain data is produced by several applications where the management systems of traditional databases incuding indexing methods are not suitable to handle such type of data. In this paper, we propose an inverted based index method for effciently searching uncertain categorical data over distributed environments. We adress two kinds of query over the distributed uncertain databases, one a distributed probabilis-tic thresholds query, where all results sastisfying the query with probablities that meet a probablistic threshold requirement are returned, and another a distributed top k-queries, where all results optimizing the transfer of the tuples and the time treatment are returned
Splinter: Practical Private Queries on Public Data
Many online services let users query public datasets such as maps, flight prices, or restaurant reviews. Unfortunately, the queries to these services reveal highly sensitive information that can compromise users’ privacy. This paper presents Splinter, a system that protects users’ queries on public data and scales to realistic applications. A user splits her query into multiple parts and sends each part to a different provider that holds a copy of the data. As long as any one of the providers is honest and does not collude with the others, the providers cannot determine the query. Splinter uses and extends a new cryptographic primitive called Function Secret Sharing (FSS) that makes it up to an order of magnitude more efficient than prior systems based on Private Information Retrieval and garbled circuits. We develop protocols extending FSS to new types of queries, such as MAX and TOPK queries. We also provide an optimized implementation of FSS using AES-NI instructions and multicores. Splinter achieves end-to-end latencies below 1.6 seconds for realistic workloads including a Yelp clone, flight search, and map routing
Multimodal Neural Databases
The rise in loosely-structured data available through text, images, and other
modalities has called for new ways of querying them. Multimedia Information
Retrieval has filled this gap and has witnessed exciting progress in recent
years. Tasks such as search and retrieval of extensive multimedia archives have
undergone massive performance improvements, driven to a large extent by recent
developments in multimodal deep learning. However, methods in this field remain
limited in the kinds of queries they support and, in particular, their
inability to answer database-like queries. For this reason, inspired by recent
work on neural databases, we propose a new framework, which we name Multimodal
Neural Databases (MMNDBs). MMNDBs can answer complex database-like queries that
involve reasoning over different input modalities, such as text and images, at
scale. In this paper, we present the first architecture able to fulfill this
set of requirements and test it with several baselines, showing the limitations
of currently available models. The results show the potential of these new
techniques to process unstructured data coming from different modalities,
paving the way for future research in the area. Code to replicate the
experiments will be released at
https://github.com/GiovanniTRA/MultimodalNeuralDatabase
Recommended from our members
Emerging Trustworthiness Issues in Distributed Learning Systems
A distributed learning system allocates learning processes onto several workstations to enable faster learning algorithms. Federated Learning (FL) is an increasingly popular type of distributed learning which allows mutually untrusted clients to collaboratively train a common machine learning model without sharing their private/proprietary training data with each other. In this dissertation, we aim to address emerging trustworthiness issues in distributed learning systems, particularly in the field of FL.
First, we tackle the issue of robustness in FL and demonstrate its susceptibility by presenting a comprehensive analysis of the various poisoning attacks and defensive aggregation rules proposed in the literature and connecting them under a common framework. To address this issue, we propose Federated Rank Learning (FRL) which reduces the space of client updates from a continuous space of float numbers in standard FL to a discrete space of integer values, limiting the adversary\u27s options for poisoning attacks.
Next, we address the privacy concerns in FL, including access privacy and data privacy. An adversarial server in FL gets information about the data distribution of a target client by monitoring either I) local updates that the target submits throughout the FL training or II) the access pattern of the target, which can be privacy sensitive in many real-world scenarios. To preserve access privacy, we design Heterogeneous Private Information Retrieval (HPIR), which allows clients to fetch their specific model parameters from untrusted servers without leaking any information. We believe that HPIR will enable new application scenarios for private distributed learning systems, as well as improve the usability of some of the known applications of PIR. To preserve data privacy, we show that local rankings leak less information about private training data. We conduct a comprehensive investigation on the privacy of rankings in FRL to measure data leakage compared to weight parameter updates in standard FL in presence of the state-of-the-art white-box membership inference attack.
Finally, we address the issue of fairness in FL where a single model cannot represent all clients equally due to heterogeneity in their data distributions. To alleviate this issue, we propose Equal and Equitable Federated Learning (E2FL). E2FL produces fair federated learning models by preserving both equity and equality among the participating clients based on learning on parameter rankings where multiple global models are learned so that each group of clients can benefit from their personalized model
Doctor of Philosophy
dissertationWe are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data
Differentially Private Location Privacy in Practice
With the wide adoption of handheld devices (e.g. smartphones, tablets) a
large number of location-based services (also called LBSs) have flourished
providing mobile users with real-time and contextual information on the move.
Accounting for the amount of location information they are given by users,
these services are able to track users wherever they go and to learn sensitive
information about them (e.g. their points of interest including home, work,
religious or political places regularly visited). A number of solutions have
been proposed in the past few years to protect users location information while
still allowing them to enjoy geo-located services. Among the most robust
solutions are those that apply the popular notion of differential privacy to
location privacy (e.g. Geo-Indistinguishability), promising strong theoretical
privacy guarantees with a bounded accuracy loss. While these theoretical
guarantees are attracting, it might be difficult for end users or practitioners
to assess their effectiveness in the wild. In this paper, we carry on a
practical study using real mobility traces coming from two different datasets,
to assess the ability of Geo-Indistinguishability to protect users' points of
interest (POIs). We show that a curious LBS collecting obfuscated location
information sent by mobile users is still able to infer most of the users POIs
with a reasonable both geographic and semantic precision. This precision
depends on the degree of obfuscation applied by Geo-Indistinguishability.
Nevertheless, the latter also has an impact on the overhead incurred on mobile
devices resulting in a privacy versus overhead trade-off. Finally, we show in
our study that POIs constitute a quasi-identifier for mobile users and that
obfuscating them using Geo-Indistinguishability is not sufficient as an
attacker is able to re-identify at least 63% of them despite a high degree of
obfuscation.Comment: In Proceedings of the Third Workshop on Mobile Security Technologies
(MoST) 2014 (http://arxiv.org/abs/1410.6674
- …