40 research outputs found
Cloud-based Quadratic Optimization with Partially Homomorphic Encryption
The development of large-scale distributed control systems has led to the
outsourcing of costly computations to cloud-computing platforms, as well as to
concerns about privacy of the collected sensitive data. This paper develops a
cloud-based protocol for a quadratic optimization problem involving multiple
parties, each holding information it seeks to maintain private. The protocol is
based on the projected gradient ascent on the Lagrange dual problem and
exploits partially homomorphic encryption and secure multi-party computation
techniques. Using formal cryptographic definitions of indistinguishability, the
protocol is shown to achieve computational privacy, i.e., there is no
computationally efficient algorithm that any involved party can employ to
obtain private information beyond what can be inferred from the party's inputs
and outputs only. In order to reduce the communication complexity of the
proposed protocol, we introduced a variant that achieves this objective at the
expense of weaker privacy guarantees. We discuss in detail the computational
and communication complexity properties of both algorithms theoretically and
also through implementations. We conclude the paper with a discussion on
computational privacy and other notions of privacy such as the non-unique
retrieval of the private information from the protocol outputs
Differentially Private Vertical Federated Clustering
In many applications, multiple parties have private data regarding the same
set of users but on disjoint sets of attributes, and a server wants to leverage
the data to train a model. To enable model learning while protecting the
privacy of the data subjects, we need vertical federated learning (VFL)
techniques, where the data parties share only information for training the
model, instead of the private data. However, it is challenging to ensure that
the shared information maintains privacy while learning accurate models. To the
best of our knowledge, the algorithm proposed in this paper is the first
practical solution for differentially private vertical federated k-means
clustering, where the server can obtain a set of global centers with a provable
differential privacy guarantee. Our algorithm assumes an untrusted central
server that aggregates differentially private local centers and membership
encodings from local data parties. It builds a weighted grid as the synopsis of
the global dataset based on the received information. Final centers are
generated by running any k-means algorithm on the weighted grid. Our approach
for grid weight estimation uses a novel, light-weight, and differentially
private set intersection cardinality estimation algorithm based on the
Flajolet-Martin sketch. To improve the estimation accuracy in the setting with
more than two data parties, we further propose a refined version of the weights
estimation algorithm and a parameter tuning strategy to reduce the final
k-means utility to be close to that in the central private setting. We provide
theoretical utility analysis and experimental evaluation results for the
cluster centers computed by our algorithm and show that our approach performs
better both theoretically and empirically than the two baselines based on
existing techniques
Distributed Training of Graph Convolutional Networks
The aim of this work is to develop a fully-distributed algorithmic framework
for training graph convolutional networks (GCNs). The proposed method is able
to exploit the meaningful relational structure of the input data, which are
collected by a set of agents that communicate over a sparse network topology.
After formulating the centralized GCN training problem, we first show how to
make inference in a distributed scenario where the underlying data graph is
split among different agents. Then, we propose a distributed gradient descent
procedure to solve the GCN training problem. The resulting model distributes
computation along three lines: during inference, during back-propagation, and
during optimization. Convergence to stationary solutions of the GCN training
problem is also established under mild conditions. Finally, we propose an
optimization criterion to design the communication topology between agents in
order to match with the graph describing data relationships. A wide set of
numerical results validate our proposal. To the best of our knowledge, this is
the first work combining graph convolutional neural networks with distributed
optimization.Comment: Published on IEEE Transactions on Signal and Information Processing
over Network
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Vertical Federated Learning
Vertical Federated Learning (VFL) is a federated learning setting where
multiple parties with different features about the same set of users jointly
train machine learning models without exposing their raw data or model
parameters. Motivated by the rapid growth in VFL research and real-world
applications, we provide a comprehensive review of the concept and algorithms
of VFL, as well as current advances and challenges in various aspects,
including effectiveness, efficiency, and privacy. We provide an exhaustive
categorization for VFL settings and privacy-preserving protocols and
comprehensively analyze the privacy attacks and defense strategies for each
protocol. In the end, we propose a unified framework, termed VFLow, which
considers the VFL problem under communication, computation, privacy, and
effectiveness constraints. Finally, we review the most recent advances in
industrial applications, highlighting open challenges and future directions for
VFL