341 research outputs found
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an
appropriate strategy, processing a batch of correlated queries as a whole
achieves considerably higher accuracy than answering them individually.
However, to our knowledge there is currently no practical solution to find such
a strategy for an arbitrary query batch; existing methods either return
strategies of poor quality (often worse than naive methods) or require
prohibitively expensive computations for even moderately large domains.
Motivated by this, we propose low-rank mechanism (LRM), the first practical
differentially private technique for answering batch linear queries with high
accuracy. LRM works for both exact (i.e., -) and approximate (i.e.,
(, )-) differential privacy definitions. We derive the
utility guarantees of LRM, and provide guidance on how to set the privacy
parameters given the user's utility expectation. Extensive experiments using
real data demonstrate that our proposed method consistently outperforms
state-of-the-art query processing solutions under differential privacy, by
large margins.Comment: ACM Transactions on Database Systems (ACM TODS). arXiv admin note:
text overlap with arXiv:1212.230
Adversarial Analysis of the Differentially-Private Federated Learning in Cyber-Physical Critical Infrastructures
Differential privacy (DP) is considered to be an effective
privacy-preservation method to secure the promising distributed machine
learning (ML) paradigm-federated learning (FL) from privacy attacks (e.g.,
membership inference attack). Nevertheless, while the DP mechanism greatly
alleviates privacy concerns, recent studies have shown that it can be exploited
to conduct security attacks (e.g., false data injection attacks). To address
such attacks on FL-based applications in critical infrastructures, in this
paper, we perform the first systematic study on the DP-exploited poisoning
attacks from an adversarial point of view. We demonstrate that the DP method,
despite providing a level of privacy guarantee, can effectively open a new
poisoning attack vector for the adversary. Our theoretical analysis and
empirical evaluation of a smart grid dataset show the FL performance
degradation (sub-optimal model generation) scenario due to the differential
noise-exploited selective model poisoning attacks. As a countermeasure, we
propose a reinforcement learning-based differential privacy level selection
(rDP) process. The rDP process utilizes the differential privacy parameters
(privacy loss, information leakage probability, etc.) and the losses to
intelligently generate an optimal privacy level for the nodes. The evaluation
shows the accumulated reward and errors of the proposed technique converge to
an optimal privacy policy.Comment: 11 pages, 5 figures, 4 tables. This work has been submitted to IEEE
for possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
Ensemble Nonlinear Model Predictive Control for Residential Solar Battery Energy Management
In a dynamic distribution market environment, residential prosumers with solar power generation and battery energy storage devices can flexibly interact with the power grid via power exchange. Providing a schedule of this bidirectional power dispatch can facilitate the operational planning for the grid operator and bring additional benefits to the prosumers with some economic incentives. However, the major obstacle to achieving this win-win situation is the difficulty in 1) predicting the nonlinear behaviors of battery degradation under unknown operating conditions and 2) addressing the highly uncertain generation/load patterns, in a computationally viable way. This paper thus establishes a robust short-term dispatch framework for residential prosumers equipped with rooftop solar photovoltaic panels and household batteries. The objective is to achieve the minimum-cost operation under the dynamic distribution energy market environment with stipulated dispatch rules. A general nonlinear optimization problem is formulated, taking into consideration the operating costs due to electricity trading, battery degradation, and various operating constraints. The optimization problem is solved in real-time using a proposed ensemble nonlinear model predictive control-based economic dispatch strategy, where the uncertainty in the forecast has been addressed adequately albeit with limited local data. The effectiveness of the proposed algorithm has been validated using real-world prosumer datasets
k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data
Data Mining has wide applications in many areas such as banking, medicine,
scientific research and among government agencies. Classification is one of the
commonly used tasks in data mining applications. For the past decade, due to
the rise of various privacy issues, many theoretical and practical solutions to
the classification problem have been proposed under different security models.
However, with the recent popularity of cloud computing, users now have the
opportunity to outsource their data, in encrypted form, as well as the data
mining tasks to the cloud. Since the data on the cloud is in encrypted form,
existing privacy preserving classification techniques are not applicable. In
this paper, we focus on solving the classification problem over encrypted data.
In particular, we propose a secure k-NN classifier over encrypted data in the
cloud. The proposed k-NN protocol protects the confidentiality of the data,
user's input query, and data access patterns. To the best of our knowledge, our
work is the first to develop a secure k-NN classifier over encrypted data under
the semi-honest model. Also, we empirically analyze the efficiency of our
solution through various experiments.Comment: 29 pages, 2 figures, 3 tables arXiv admin note: substantial text
overlap with arXiv:1307.482
PanCast: Listening to Bluetooth Beacons for Epidemic Risk Mitigation
During the ongoing COVID-19 pandemic, there have been burgeoning efforts to
develop and deploy smartphone apps to expedite contact tracing and risk
notification. Most of these apps track pairwise encounters between individuals
via Bluetooth and then use these tracked encounters to identify and notify
those who might have been in proximity of a contagious individual.
Unfortunately, these apps have not yet proven sufficiently effective, partly
owing to low adoption rates, but also due to the difficult tradeoff between
utility and privacy and the fact that, in COVID-19, most individuals do not
infect anyone but a few superspreaders infect many in superspreading events. In
this paper, we proposePanCast, a privacy-preserving and inclusive system for
epidemic risk assessment and notification that scales gracefully with adoption
rates, utilizes location and environmental information to increase utility
without tracking its users, and can be used to identify superspreading events.
To this end, rather than capturing pairwise encounters between smartphones, our
system utilizes Bluetooth encounters between beacons placed in strategic
locations where superspreading events are most likely to occur and inexpensive,
zero-maintenance, small devices that users can attach to their keyring. PanCast
allows healthy individuals to use the system in a purely passive "radio" mode,
and can assist and benefit from other digital and manual contact tracing
systems. Finally, PanCast can be gracefully dismantled at the end of the
pandemic, minimizing abuse from any malevolent government or entity
Checking global usage of resources handled with local policies
We present a methodology to reason about resource usage (acquisition, release, revision, and so on) and, in particular, to predict bad usage of resources. Keeping in mind the interplay between local and global information that occur in application-resource interactions, we model resources as entities with local policies and we study global properties that govern overall interactions. Formally, our model is an extension of π-calculus with primitives to manage resources. To predict possible bad usage of resources, we develop a Control Flow Analysis that computes a static over-approximation of process behaviour
Ekiden: A Platform for Confidentiality-Preserving, Trustworthy, and Performant Smart Contract Execution
Smart contracts are applications that execute on blockchains. Today they
manage billions of dollars in value and motivate visionary plans for pervasive
blockchain deployment. While smart contracts inherit the availability and other
security assurances of blockchains, however, they are impeded by blockchains'
lack of confidentiality and poor performance.
We present Ekiden, a system that addresses these critical gaps by combining
blockchains with Trusted Execution Environments (TEEs). Ekiden leverages a
novel architecture that separates consensus from execution, enabling efficient
TEE-backed confidentiality-preserving smart-contracts and high scalability. Our
prototype (with Tendermint as the consensus layer) achieves example performance
of 600x more throughput and 400x less latency at 1000x less cost than the
Ethereum mainnet.
Another contribution of this paper is that we systematically identify and
treat the pitfalls arising from harmonizing TEEs and blockchains. Treated
separately, both TEEs and blockchains provide powerful guarantees, but
hybridized, though, they engender new attacks. For example, in naive designs,
privacy in TEE-backed contracts can be jeopardized by forgery of blocks, a
seemingly unrelated attack vector. We believe the insights learned from Ekiden
will prove to be of broad importance in hybridized TEE-blockchain systems
- …