29,806 research outputs found
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Survey of Spectrum Sharing for Inter-Technology Coexistence
Increasing capacity demands in emerging wireless technologies are expected to
be met by network densification and spectrum bands open to multiple
technologies. These will, in turn, increase the level of interference and also
result in more complex inter-technology interactions, which will need to be
managed through spectrum sharing mechanisms. Consequently, novel spectrum
sharing mechanisms should be designed to allow spectrum access for multiple
technologies, while efficiently utilizing the spectrum resources overall.
Importantly, it is not trivial to design such efficient mechanisms, not only
due to technical aspects, but also due to regulatory and business model
constraints. In this survey we address spectrum sharing mechanisms for wireless
inter-technology coexistence by means of a technology circle that incorporates
in a unified, system-level view the technical and non-technical aspects. We
thus systematically explore the spectrum sharing design space consisting of
parameters at different layers. Using this framework, we present a literature
review on inter-technology coexistence with a focus on wireless technologies
with equal spectrum access rights, i.e. (i) primary/primary, (ii)
secondary/secondary, and (iii) technologies operating in a spectrum commons.
Moreover, we reflect on our literature review to identify possible spectrum
sharing design solutions and performance evaluation approaches useful for
future coexistence cases. Finally, we discuss spectrum sharing design
challenges and suggest future research directions
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Deep Learning has recently become hugely popular in machine learning,
providing significant improvements in classification accuracy in the presence
of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning.
Models are typically trained in a centralized manner with all the data being
processed by the same training algorithm. If the data is a collection of users'
private data, including habits, personal pictures, geographical positions,
interests, and more, the centralized server will have access to sensitive
information that could potentially be mishandled. To tackle this problem,
collaborative deep learning models have recently been proposed where parties
locally train their deep learning structures and only share a subset of the
parameters in the attempt to keep their respective training sets private.
Parameters can also be obfuscated via differential privacy (DP) to make
information extraction even more challenging, as proposed by Shokri and
Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep
learning is susceptible to a powerful attack that we devise in this paper. In
particular, we show that a distributed, federated, or decentralized deep
learning approach is fundamentally broken and does not protect the training
sets of honest participants. The attack we developed exploits the real-time
nature of the learning process that allows the adversary to train a Generative
Adversarial Network (GAN) that generates prototypical samples of the targeted
training set that was meant to be private (the samples generated by the GAN are
intended to come from the same distribution as the training data).
Interestingly, we show that record-level DP applied to the shared parameters of
the model, as suggested in previous work, is ineffective (i.e., record-level DP
is not designed to address our attack).Comment: ACM CCS'17, 16 pages, 18 figure
- …