3 research outputs found
Finding the Optimal Network Depth in Classification Tasks
We develop a fast end-to-end method for training lightweight neural networks
using multiple classifier heads. By allowing the model to determine the
importance of each head and rewarding the choice of a single shallow
classifier, we are able to detect and remove unneeded components of the
network. This operation, which can be seen as finding the optimal depth of the
model, significantly reduces the number of parameters and accelerates inference
across different hardware processing units, which is not the case for many
standard pruning methods. We show the performance of our method on multiple
network architectures and datasets, analyze its optimization properties, and
conduct ablation studies
Accurate Computation of the Log-Sum-Exp and Softmax Functions
Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to overflow and underflow, especially in low precision arithmetic. Software implementations commonly use alternative formulas that avoid overflow and reduce the chance of harmful underflow, employing a shift or another rewriting. Although mathematically equivalent, these variants behave differently in floating-point arithmetic. We give rounding error analyses of different evaluation algorithms and interpret the error bounds using condition numbers for the functions. We conclude, based on the analysis and numerical experiments, that the shifted formulas are of similar accuracy to the unshifted ones and that the shifted softmax formula is typically more accurate than a division-free variant
Accurate Computation of the Log-Sum-Exp and Softmax Functions
Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to overflow and underflow, especially in low precision arithmetic. Software implementations commonly use alternative formulas that avoid overflow and reduce the chance of harmful underflow, employing a shift or another rewriting. Although mathematically equivalent, these variants behave differently in floating-point arithmetic. We give rounding error analyses of different evaluation algorithms and interpret the error bounds using condition numbers for the functions. We conclude, based on the analysis and numerical experiments, that the shifted formulas are of similar accuracy to the unshifted ones and that the shifted softmax formula is typically more accurate than a division-free variant