Search CORE

62,828 research outputs found

Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering

Author: Chen Binghui
Deng Weihong
Publication venue
Publication date: 21/01/2019
Field of study

Deep metric learning has been widely applied in many computer vision tasks, and recently, it is more attractive in \emph{zero-shot image retrieval and clustering}(ZSRC) where a good embedding is requested such that the unseen classes can be distinguished well. Most existing works deem this 'good' embedding just to be the discriminative one and thus race to devise powerful metric objectives or hard-sample mining strategies for leaning discriminative embedding. However, in this paper, we first emphasize that the generalization ability is a core ingredient of this 'good' embedding as well and largely affects the metric performance in zero-shot settings as a matter of fact. Then, we propose the Energy Confused Adversarial Metric Learning(ECAML) framework to explicitly optimize a robust metric. It is mainly achieved by introducing an interesting Energy Confusion regularization term, which daringly breaks away from the traditional metric learning idea of discriminative objective devising, and seeks to 'confuse' the learned model so as to encourage its generalization ability by reducing overfitting on the seen classes. We train this confusion term together with the conventional metric objective in an adversarial manner. Although it seems weird to 'confuse' the network, we show that our ECAML indeed serves as an efficient regularization technique for metric learning and is applicable to various conventional metric methods. This paper empirically and experimentally demonstrates the importance of learning embedding with good generalization, achieving state-of-the-art performances on the popular CUB, CARS, Stanford Online Products and In-Shop datasets for ZSRC tasks. \textcolor[rgb]{1, 0, 0}{Code available at http://www.bhchen.cn/}.Comment: AAAI 2019, Spotligh

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A note on entropy estimation

Author: Schürmann Thomas
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

We compare an entropy estimator

H_z

recently discussed in [10] with two estimators

H_1

and

H_2

introduced in [6][7]. We prove the identity

H_z \equiv H_1

, which has not been taken into account in [10]. Then, we prove that the statistical bias of

H_1

is less than the bias of the ordinary likelihood estimator of entropy. Finally, by numerical simulation we verify that for the most interesting regime of small sample estimation and large event spaces, the estimator

H_2

has a significant smaller statistical error than

H_z

.Comment: 7 pages, including 4 figures; two references adde

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems

Author: Egger Bernhard
Gerig Thomas
Kortylewski Adam
Morel-Forster Andreas
Schneider Andreas
Vetter Thomas
Publication venue
Publication date: 01/01/2018
Field of study

It is unknown what kind of biases modern in the wild face datasets have because of their lack of annotation. A direct consequence of this is that total recognition rates alone only provide limited insight about the generalization ability of a Deep Convolutional Neural Networks (DCNNs). We propose to empirically study the effect of different types of dataset biases on the generalization ability of DCNNs. Using synthetically generated face images, we study the face recognition rate as a function of interpretable parameters such as face pose and light. The proposed method allows valuable details about the generalization performance of different DCNN architectures to be observed and compared. In our experiments, we find that: 1) Indeed, dataset bias has a significant influence on the generalization performance of DCNNs. 2) DCNNs can generalize surprisingly well to unseen illumination conditions and large sampling gaps in the pose variation. 3) Using the presented methodology we reveal that the VGG-16 architecture outperforms the AlexNet architecture at face recognition tasks because it can much better generalize to unseen face poses, although it has significantly more parameters. 4) We uncover a main limitation of current DCNN architectures, which is the difficulty to generalize when different identities to not share the same pose variation. 5) We demonstrate that our findings on synthetic data also apply when learning from real-world data. Our face image generator is publicly available to enable the community to benchmark other DCNN architectures.Comment: Accepted to CVPR 2018 Workshop on Analysis and Modeling of Faces and Gestures (AMFG

arXiv.org e-Print Archive

Crossref

edoc

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Author: Gurevych Iryna
Moosavi Nafise Sadat
Utama Prasetya Ajie
Publication venue
Publication date: 01/01/2020
Field of study

Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.Comment: to appear at ACL 202

arXiv.org e-Print Archive

TUbiblio

Crossref

White Rose Research Online

Survey propagation at finite temperature: application to a Sourlas code as a toy model

Author: B Wemmenhove
Bounkong S Mourik van J Saad D
Braunstein A
De Almeida J R L
Gallager R G
H J Kappen
Heskes T
Ihler A T
Kabashima Y
Kabashima Y
Migliorini G Saad D
Montanari A
Mooij J
Mooij J M Kappen H J
Mourik van J
Mézard M
Mézard M
Nishimori H
Nishimori H
Pearl J
Pelizzola A
Pretti M
Sigal L
Stern D H
Sun J Li Y Kang S B Shum H-Y
Yanover C
Yedidia J S
Yedidia J S
Publication venue: 'IOP Publishing'
Publication date: 24/08/2005
Field of study

In this paper we investigate a finite temperature generalization of survey propagation, by applying it to the problem of finite temperature decoding of a biased finite connectivity Sourlas code for temperatures lower than the Nishimori temperature. We observe that the result is a shift of the location of the dynamical critical channel noise to larger values than the corresponding dynamical transition for belief propagation, as suggested recently by Migliorini and Saad for LDPC codes. We show how the finite temperature 1-RSB SP gives accurate results in the regime where competing approaches fail to converge or fail to recover the retrieval state

arXiv.org e-Print Archive

Crossref

Radboud Repository

Emergence of Invariance and Disentanglement in Deep Representations

Author: Achille Alessandro
Soatto Stefano
Publication venue
Publication date: 28/06/2018
Field of study

Using established principles from Statistics and Information Theory, we show that invariance to nuisance factors in a deep neural network is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then decompose the cross-entropy loss used during training and highlight the presence of an inherent overfitting term. We propose regularizing the loss by bounding such a term in two equivalent ways: One with a Kullbach-Leibler term, which relates to a PAC-Bayes perspective; the other using the information in the weights as a measure of complexity of a learned model, yielding a novel Information Bottleneck for the weights. Finally, we show that invariance and independence of the components of the representation learned by the network are bounded above and below by the information in the weights, and therefore are implicitly optimized during training. The theory enables us to quantify and predict sharp phase transitions between underfitting and overfitting of random labels when using our regularized loss, which we verify in experiments, and sheds light on the relation between the geometry of the loss function, invariance properties of the learned representation, and generalization error.Comment: Deep learning, neural network, representation, flat minima, information bottleneck, overfitting, generalization, sufficiency, minimality, sensitivity, information complexity, stochastic gradient descent, regularization, total correlation, PAC-Baye

arXiv.org e-Print Archive

Crossref