Search CORE

2 research outputs found

Recommended from our members

Characterising the area under the curve loss function landscape

Author: Cafolla CT
Morgan JWR
Niroomand MP
Wales DJ
Publication venue: Machine Learning: Science and Technology
Publication date: 07/01/2022
Field of study

Abstract One of the most common metrics to evaluate neural network classifiers is the area under the receiver operating characteristic curve (AUC). However, optimisation of the AUC as the loss function during network training is not a standard procedure. Here we compare minimising the cross-entropy (CE) loss and optimising the AUC directly. In particular, we analyse the loss function landscape (LFL) of approximate AUC (appAUC) loss functions to discover the organisation of this solution space. We discuss various surrogates for AUC approximation and show their differences. We find that the characteristics of the appAUC landscape are significantly different from the CE landscape. The approximate AUC loss function improves testing AUC, and the appAUC landscape has substantially more minima, but these minima are less robust, with larger average Hessian eigenvalues. We provide a theoretical foundation to explain these results. To generalise our results, we lastly provide an overview of how the LFL can help to guide loss function analysis and selection.EPSRC Downing College, Cambridge Interdisciplinary Institute for Artificial Intelligence at 3iA Cote d'Azu

Apollo (Cambridge)

Recommended from our members

On the capacity and superposition of minima in neural network loss function landscapes

Author: Cafolla CT
Morgan JWR
Niroomand MP
Wales DJ
Publication venue: Machine Learning: Science and Technology
Publication date: 20/04/2022
Field of study

Abstract Minima of the loss function landscape (LFL) of a neural network are locally optimal sets of weights that extract and process information from the input data to make outcome predictions. In underparameterised networks, the capacity of the weights may be insufficient to fit all the relevant information. We demonstrate that different local minima specialise in certain aspects of the learning problem, and process the input information differently. This effect can be exploited using a meta-network in which the predictive power from multiple minima of the LFL is combined to produce a better classifier. With this approach, we can increase the area under the receiver operating characteristic curve by around 20 % for a complex learning problem. We propose a theoretical basis for combining minima and show how a meta-network can be trained to select the representative that is used for classification of a specific data item. Finally, we present an analysis of symmetry-equivalent solutions to machine learning problems, which provides a systematic means to improve the efficiency of this approach.</jats:p

Apollo (Cambridge)