2,108 research outputs found
An Improved Private Mechanism for Small Databases
We study the problem of answering a workload of linear queries ,
on a database of size at most drawn from a universe
under the constraint of (approximate) differential privacy.
Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for
any given and , answers the queries with average error that is
at most a factor polynomial in and
worse than the best possible. Here we improve on this guarantee and give a
mechanism whose competitiveness ratio is at most polynomial in and
, and has no dependence on . Our mechanism
is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in
place of an ad-hoc noise distribution, we use a distribution which is in a
sense optimal for the projection mechanism, and analyze it using convex duality
and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track
FedDisco: Federated Learning with Discrepancy-Aware Collaboration
This work considers the category distribution heterogeneity in federated
learning. This issue is due to biased labeling preferences at multiple clients
and is a typical setting of data heterogeneity. To alleviate this issue, most
previous works consider either regularizing local models or fine-tuning the
global model, while they ignore the adjustment of aggregation weights and
simply assign weights based on the dataset size. However, based on our
empirical observations and theoretical analysis, we find that the dataset size
is not optimal and the discrepancy between local and global category
distributions could be a beneficial and complementary indicator for determining
aggregation weights. We thus propose a novel aggregation method, Federated
Learning with Discrepancy-aware Collaboration (FedDisco), whose aggregation
weights not only involve both the dataset size and the discrepancy value, but
also contribute to a tighter theoretical upper bound of the optimization error.
FedDisco also promotes privacy-preservation, communication and computation
efficiency, as well as modularity. Extensive experiments show that our FedDisco
outperforms several state-of-the-art methods and can be easily incorporated
with many existing methods to further enhance the performance. Our code will be
available at https://github.com/MediaBrain-SJTU/FedDisco.Comment: Accepted by International Conference on Machine Learning (ICML2023
FACT: Federated Adversarial Cross Training
Federated Learning (FL) facilitates distributed model development to
aggregate multiple confidential data sources. The information transfer among
clients can be compromised by distributional differences, i.e., by non-i.i.d.
data. A particularly challenging scenario is the federated model adaptation to
a target client without access to annotated data. We propose Federated
Adversarial Cross Training (FACT), which uses the implicit domain differences
between source clients to identify domain shifts in the target domain. In each
round of FL, FACT cross initializes a pair of source clients to generate domain
specialized representations which are then used as a direct adversary to learn
a domain invariant data representation. We empirically show that FACT
outperforms state-of-the-art federated, non-federated and source-free domain
adaptation models on three popular multi-source-single-target benchmarks, and
state-of-the-art Unsupervised Domain Adaptation (UDA) models on
single-source-single-target experiments. We further study FACT's behavior with
respect to communication restrictions and the number of participating clients
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
Machine learning methods strive to acquire a robust model during training
that can generalize well to test samples, even under distribution shifts.
However, these methods often suffer from a performance drop due to unknown test
distributions. Test-time adaptation (TTA), an emerging paradigm, has the
potential to adapt a pre-trained model to unlabeled data during testing, before
making predictions. Recent progress in this paradigm highlights the significant
benefits of utilizing unlabeled data for training self-adapted models prior to
inference. In this survey, we divide TTA into several distinct categories,
namely, test-time (source-free) domain adaptation, test-time batch adaptation,
online test-time adaptation, and test-time prior adaptation. For each category,
we provide a comprehensive taxonomy of advanced algorithms, followed by a
discussion of different learning scenarios. Furthermore, we analyze relevant
applications of TTA and discuss open challenges and promising areas for future
research. A comprehensive list of TTA methods can be found at
\url{https://github.com/tim-learn/awesome-test-time-adaptation}.Comment: Discussions, comments, and questions are all welcomed in
\url{https://github.com/tim-learn/awesome-test-time-adaptation
Source-free Domain Adaptive Human Pose Estimation
Human Pose Estimation (HPE) is widely used in various fields, including
motion analysis, healthcare, and virtual reality. However, the great expenses
of labeled real-world datasets present a significant challenge for HPE. To
overcome this, one approach is to train HPE models on synthetic datasets and
then perform domain adaptation (DA) on real-world data. Unfortunately, existing
DA methods for HPE neglect data privacy and security by using both source and
target data in the adaptation process. To this end, we propose a new task,
named source-free domain adaptive HPE, which aims to address the challenges of
cross-domain learning of HPE without access to source data during the
adaptation process. We further propose a novel framework that consists of three
models: source model, intermediate model, and target model, which explores the
task from both source-protect and target-relevant perspectives. The
source-protect module preserves source information more effectively while
resisting noise, and the target-relevant module reduces the sparsity of spatial
representations by building a novel spatial probability space, and
pose-specific contrastive learning and information maximization are proposed on
the basis of this space. Comprehensive experiments on several domain adaptive
HPE benchmarks show that the proposed method outperforms existing approaches by
a considerable margin. The codes are available at
https://github.com/davidpengucf/SFDAHPE.Comment: Accepted by ICCV 202
- …