34,513 research outputs found
Learning to Generate Image Embeddings with User-level Differential Privacy
Small on-device models have been successfully trained with user-level
differential privacy (DP) for next word prediction and image classification
tasks in the past. However, existing methods can fail when directly applied to
learn embedding models using supervised training data with a large class space.
To achieve user-level DP for large image-to-embedding feature extractors, we
propose DP-FedEmb, a variant of federated learning algorithms with per-user
sensitivity control and noise addition, to train from user-partitioned data
centralized in the datacenter. DP-FedEmb combines virtual clients, partial
aggregation, private local fine-tuning, and public pretraining to achieve
strong privacy utility trade-offs. We apply DP-FedEmb to train image embedding
models for faces, landmarks and natural species, and demonstrate its superior
utility under same privacy budget on benchmark datasets DigiFace, EMNIST, GLD
and iNaturalist. We further illustrate it is possible to achieve strong
user-level DP guarantees of while controlling the utility drop
within 5%, when millions of users can participate in training
ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving Few-Shot Learning
Self-supervised learning (SSL) techniques have recently been integrated into
the few-shot learning (FSL) framework and have shown promising results in
improving the few-shot image classification performance. However, existing SSL
approaches used in FSL typically seek the supervision signals from the global
embedding of every single image. Therefore, during the episodic training of
FSL, these methods cannot capture and fully utilize the local visual
information in image samples and the data structure information of the whole
episode, which are beneficial to FSL. To this end, we propose to augment the
few-shot learning objective with a novel self-supervised Episodic Spatial
Pretext Task (ESPT). Specifically, for each few-shot episode, we generate its
corresponding transformed episode by applying a random geometric transformation
to all the images in it. Based on these, our ESPT objective is defined as
maximizing the local spatial relationship consistency between the original
episode and the transformed one. With this definition, the ESPT-augmented FSL
objective promotes learning more transferable feature representations that
capture the local spatial features of different images and their
inter-relational structural information in each input episode, thus enabling
the model to generalize better to new categories with only a few samples.
Extensive experiments indicate that our ESPT method achieves new
state-of-the-art performance for few-shot image classification on three
mainstay benchmark datasets. The source code will be available at:
https://github.com/Whut-YiRong/ESPT.Comment: Accepted by AAAI 202
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
- …