1,324 research outputs found
Systematizing Decentralization and Privacy: Lessons from 15 Years of Research and Deployments
Decentralized systems are a subset of distributed systems where multiple
authorities control different components and no authority is fully trusted by
all. This implies that any component in a decentralized system is potentially
adversarial. We revise fifteen years of research on decentralization and
privacy, and provide an overview of key systems, as well as key insights for
designers of future systems. We show that decentralized designs can enhance
privacy, integrity, and availability but also require careful trade-offs in
terms of system complexity, properties provided, and degree of
decentralization. These trade-offs need to be understood and navigated by
designers. We argue that a combination of insights from cryptography,
distributed systems, and mechanism design, aligned with the development of
adequate incentives, are necessary to build scalable and successful
privacy-preserving decentralized systems
Verifiable Encodings for Secure Homomorphic Analytics
Homomorphic encryption, which enables the execution of arithmetic operations
directly on ciphertexts, is a promising solution for protecting privacy of
cloud-delegated computations on sensitive data. However, the correctness of the
computation result is not ensured. We propose two error detection encodings and
build authenticators that enable practical client-verification of cloud-based
homomorphic computations under different trade-offs and without compromising on
the features of the encryption algorithm. Our authenticators operate on top of
trending ring learning with errors based fully homomorphic encryption schemes
over the integers. We implement our solution in VERITAS, a ready-to-use system
for verification of outsourced computations executed over encrypted data. We
show that contrary to prior work VERITAS supports verification of any
homomorphic operation and we demonstrate its practicality for various
applications, such as ride-hailing, genomic-data analysis, encrypted search,
and machine-learning training and inference.Comment: update authors, typos corrected, scheme update
Robust and Actively Secure Serverless Collaborative Learning
Collaborative machine learning (ML) is widely used to enable institutions to
learn better models from distributed data. While collaborative approaches to
learning intuitively protect user data, they remain vulnerable to either the
server, the clients, or both, deviating from the protocol. Indeed, because the
protocol is asymmetric, a malicious server can abuse its power to reconstruct
client data points. Conversely, malicious clients can corrupt learning with
malicious updates. Thus, both clients and servers require a guarantee when the
other cannot be trusted to fully cooperate. In this work, we propose a
peer-to-peer (P2P) learning scheme that is secure against malicious servers and
robust to malicious clients. Our core contribution is a generic framework that
transforms any (compatible) algorithm for robust aggregation of model updates
to the setting where servers and clients can act maliciously. Finally, we
demonstrate the computational efficiency of our approach even with 1-million
parameter models trained by 100s of peers on standard datasets.Comment: Accepted at NeurIPS 202
Evaluate and Guard the Wisdom of Crowds: Zero Knowledge Proofs for Crowdsourcing Truth Inference
Due to the risks of correctness and security in outsourced cloud computing,
we consider a new paradigm called crowdsourcing: distribute tasks, receive
answers and aggregate the results from multiple entities. Through this
approach, we can aggregate the wisdom of the crowd to complete tasks, ensuring
the accuracy of task completion while reducing the risks posed by the malicious
acts of a single entity. However, the ensuing question is, how can we ensure
that the aggregator has done its work honestly and each contributor's work has
been evaluated fairly?
In this paper, we propose a new scheme called . This scheme
ensures that the aggregator has honestly completed the aggregation and each
data source is fairly evaluated. We combine a cryptographic primitive called
\textit{zero-knowledge proof} with a class of \textit{truth inference
algorithms} which is widely studied in AI/ML scenarios. Under this scheme,
various complex outsourced tasks can be solved with efficiency and accuracy. To
build our scheme, a novel method to prove the precise computation of
floating-point numbers is proposed, which is nearly optimal and well-compatible
with existing argument systems. This may become an independent point of
interest. Thus our work can prove the process of aggregation and inference
without loss of precision. We fully implement and evaluate our ideas. Compared
with recent works, our scheme achieves efficiency improvement and
is robust to be widely applied
Know your customer:balancing innovation and regulation for financial inclusion
Financial inclusion depends on providing adjusted services for citizens with
disclosed vulnerabilities. At the same time, the financial industry needs to
adhere to a strict regulatory framework, which is often in conflict with the
desire for inclusive, adaptive, and privacy-preserving services. In this
article we study how this tension impacts the deployment of privacy-sensitive
technologies aimed at financial inclusion. We conduct a qualitative study with
banking experts to understand their perspectives on service development for
financial inclusion. We build and demonstrate a prototype solution based on
open source decentralized identifiers and verifiable credentials software and
report on feedback from the banking experts on this system. The technology is
promising thanks to its selective disclosure of vulnerabilities to the full
control of the individual. This supports GDPR requirements, but at the same
time, there is a clear tension between introducing these technologies and
fulfilling other regulatory requirements, particularly with respect to 'Know
Your Customer.' We consider the policy implications stemming from these
tensions and provide guidelines for the further design of related technologies.Comment: Published in the Journal Data & Polic
Practical and Provably Secure Distributed Aggregation: Verifiable Additive Homomorphic Secret Sharing
Often clients (e.g., sensors, organizations) need to outsource joint computations that are based on some joint inputs to external untrusted servers. These computations often rely on the aggregation of data collected from multiple clients, while the clients want to guarantee that the results are correct and, thus, an output that can be publicly verified is required. However, important security and privacy challenges are raised, since clients may hold sensitive information. In this paper, we propose an approach, called verifiable additive homomorphic secret sharing (VAHSS), to achieve practical and provably secure aggregation of data, while allowing for the clients to protect their secret data and providing public verifiability i.e., everyone should be able to verify the correctness of the computed result. We propose three VAHSS constructions by combining an additive homomorphic secret sharing (HSS) scheme, for computing the sum of the clients\u27 secret inputs, and three different methods for achieving public verifiability, namely: (i) homomorphic collision-resistant hash functions; (ii) linear homomorphic signatures; as well as (iii) a threshold RSA signature scheme. In all three constructions, we provide a detailed correctness, security, and verifiability analysis and detailed experimental evaluations. Our results demonstrate the efficiency of our proposed constructions, especially from the client side
Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties
Learning from data owned by several parties, as in federated learning, raises
challenges regarding the privacy guarantees provided to participants and the
correctness of the computation in the presence of malicious parties. We tackle
these challenges in the context of distributed averaging, an essential building
block of distributed and federated learning. Our first contribution is a novel
distributed differentially private protocol which naturally scales with the
number of parties. The key idea underlying our protocol is to exchange
correlated Gaussian noise along the edges of a network graph, complemented by
independent noise added by each party. We analyze the differential privacy
guarantees of our protocol and the impact of the graph topology, showing that
we can match the accuracy of the trusted curator model even when each party
communicates with only a logarithmic number of other parties chosen at random.
This is in contrast with protocols in the local model of privacy (with lower
accuracy) or based on secure aggregation (where all pairs of users need to
exchange messages). Our second contribution is to enable users to prove the
correctness of their computations without compromising the efficiency and
privacy guarantees of the protocol. Our construction relies on standard
cryptographic primitives like commitment schemes and zero knowledge proofs.Comment: 39 page
Privacy-Preserving Cloud-Assisted Data Analytics
Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users\u27 data may contain private information that needs to be protected.
Cloud computing has become more and more popular in both academia and industry communities. By pooling infrastructure and servers together, it can offer virtually unlimited resources easily accessible via the Internet. Various services could be provided by cloud platforms including machine learning and data analytics.
The goal of this dissertation is to develop privacy-preserving cloud-assisted data analytics solutions to address the aforementioned challenges, leveraging the powerful and easy-to-access cloud. In particular, we propose the following systems.
To address the problem of limited computation power at user and the need of privacy protection in data analytics, we consider geometric programming (GP) in data analytics, and design a secure, efficient, and verifiable outsourcing protocol for GP. Our protocol consists of a transform scheme that converts GP to DGP, a transform scheme with computationally indistinguishability, and an efficient scheme to solve the transformed DGP at the cloud side with result verification. Evaluation results show that the proposed secure outsourcing protocol can achieve significant time savings for users.
To address the problem of limited data at individual users, we propose two distributed learning systems such that users can collaboratively train machine learning models without losing privacy. The first one is a differentially private framework to train logistic regression models with distributed data sources. We employ the relevance between input data features and the model output to significantly improve the learning accuracy. Moreover, we adopt an evaluation data set at the cloud side to suppress low-quality data sources and propose a differentially private mechanism to protect user\u27s data quality privacy. Experimental results show that the proposed framework can achieve high utility with low quality data, and strong privacy guarantee.
The second one is an efficient privacy-preserving federated learning system that enables multiple edge users to collaboratively train their models without revealing dataset. To reduce the communication overhead, we select well-aligned and large-enough magnitude gradients for uploading which leads to quick convergence. To minimize the noise added and improve model utility, each user only adds a small amount of noise to his selected gradients, encrypts the noise gradients before uploading, and the cloud server will only get the aggregate gradients that contain enough noise to achieve differential privacy. Evaluation results show that the proposed system can achieve high accuracy, low communication overhead, and strong privacy guarantee.
In future work, we plan to design a privacy-preserving data analytics with fair exchange, which ensures the payment fairness. We will also consider designing distributed learning systems with heterogeneous architectures
Privacy Preserving Opinion Aggregation
There are numerous settings in which people\u27s preferences are aggregated outside of formal elections, and where privacy and verification are important but the stringent authentication and coercion-resistant properties of government elections do not apply, a prime example being social media platforms. These systems are often iterative and have no trusted authority, in contrast to the centrally organised, single-shot elections on which most of the literature is focused. Moreover, they require a continuous flow of aggregation to take place and become available even as input is still collected from the participants which is in contrast to fairness in classical elections where partial results should never be revealed.
In this work, we explore opinion aggregation in a decentralised, iterative setting by proposing a novel protocol in which randomly-chosen participants take turns to act in an incentive-driven manner as decryption authorities. Our construction provides public verifiability, robust vote privacy and liveness guarantees, while striving to minimise the resources each participant needs to contribute
Lightweight Techniques for Private Heavy Hitters
This paper presents a new protocol for solving the private heavy-hitters
problem. In this problem, there are many clients and a small set of
data-collection servers. Each client holds a private bitstring. The servers
want to recover the set of all popular strings, without learning anything else
about any client's string. A web-browser vendor, for instance, can use our
protocol to figure out which homepages are popular, without learning any user's
homepage. We also consider the simpler private subset-histogram problem, in
which the servers want to count how many clients hold strings in a particular
set without revealing this set to the clients.
Our protocols use two data-collection servers and, in a protocol run, each
client send sends only a single message to the servers. Our protocols protect
client privacy against arbitrary misbehavior by one of the servers and our
approach requires no public-key cryptography (except for secure channels), nor
general-purpose multiparty computation. Instead, we rely on incremental
distributed point functions, a new cryptographic tool that allows a client to
succinctly secret-share the labels on the nodes of an exponentially large
binary tree, provided that the tree has a single non-zero path. Along the way,
we develop new general tools for providing malicious security in applications
of distributed point functions.
In an experimental evaluation with two servers on opposite sides of the U.S.,
the servers can find the 200 most popular strings among a set of 400,000
client-held 256-bit strings in 54 minutes. Our protocols are highly
parallelizable. We estimate that with 20 physical machines per logical server,
our protocols could compute heavy hitters over ten million clients in just over
one hour of computation.Comment: To appear in IEEE Security & Privacy 202
- …