Search CORE

694 research outputs found

Robust artificial neural networks and outlier detection. Technical report

Author: Andrei Kelarev
Cederman D
Gleb Beliakov
Huber PJ
John Yearwood
Makela MM
Mammadov MA
Masters T
Powell MJD
Press AH
Rousseeuw PJ
Rusiecki A
Sengupta S
Smola AJ
Publication venue: 'Informa UK Limited'
Publication date: 02/10/2011
Field of study

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Federation ResearchOnline

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Author: Gao Zilin
Li Peihua
Wang Qilong
Xie Jiangtao
Publication venue
Publication date: 01/04/2018
Field of study

Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Doubly Stochastic Variational Inference for Deep Gaussian Processes

Author: Deisenroth Marc
Salimbeni Hugh
Publication venue
Publication date: 01/09/2017
Field of study

Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.Comment: NIPS 201

arXiv.org e-Print Archive

UCL Discovery

Spiral - Imperial College Digital Repository