108,950 research outputs found
Channel-wise early stopping without a validation set via NNK polytope interpolation
State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.Peer ReviewedPostprint (author's final draft
Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation
Domain adaptation is to transfer the shared knowledge learned from the source
domain to a new environment, i.e., target domain. One common practice is to
train the model on both labeled source-domain data and unlabeled target-domain
data. Yet the learned models are usually biased due to the strong supervision
of the source domain. Most researchers adopt the early-stopping strategy to
prevent over-fitting, but when to stop training remains a challenging problem
since the lack of the target-domain validation set. In this paper, we propose
one efficient bootstrapping method, called Adaboost Student, explicitly
learning complementary models during training and liberating users from
empirical early stopping. Adaboost Student combines the deep model learning
with the conventional training strategy, i.e., adaptive boosting, and enables
interactions between learned models and the data sampler. We adopt one adaptive
data sampler to progressively facilitate learning on hard samples and aggregate
"weak" models to prevent over-fitting. Extensive experiments show that (1)
Without the need to worry about the stopping time, AdaBoost Student provides
one robust solution by efficient complementary model learning during training.
(2) AdaBoost Student is orthogonal to most domain adaptation methods, which can
be combined with existing approaches to further improve the state-of-the-art
performance. We have achieved competitive results on three widely-used scene
segmentation domain adaptation benchmarks.Comment: 10 pages, 7 tables, 5 figure
Improved neural network generalization using channel-wise NNK graph constructions
State-of-the-art neural network architectures continue to scale in size and deliver impressive results on unseen data points at the expense of poor interpretability. In the deep layers of these models we often encounter very high dimensional feature spaces, where constructing graphs from intermediate data representations can lead to the well-known curse of dimensionality. We propose a channel-wise graph construction method that works on lower dimensional subspaces and provides a new channel-based perspective that leads to better interpretability of the data and relationship between channels. In addition, we introduce a novel generalization estimate based on the proposed graph construction method with which we perform local polytope interpolation. We show its potential to replace the standard generalization estimate based on validation set performance to perform progressive channel-wise early stopping without requiring a validation set.Las arquitecturas de redes neuronales más avanzadas siguen aumentando en tamaño y ofreciendo resultados impresionantes en nuevos datos a costa de una escasa interpretabilidad. En las capas profundas de estos modelos nos encontramos a menudo con espacios de características de muy alta dimensión, en los que la construcción de grafos a partir de representaciones de datos intermedias puede llevar al conocido ''curse of dimensionality''. Proponemos un método de construcción de grafos por canal que trabaja en subespacios de menor dimensión y proporciona una nueva perspectiva basada en canales, que lleva a una mejor interpretabilidad de los datos y de la relación entre canales. Además, introducimos un nuevo estimador de generalización basado en el método de construcción de grafos propuesto con el que realizamos interpolación local en politopos. Mostramos su potencial para sustituir el estimador de generalización estándar basado en el rendimiento en un set de validación independiente para realizar ''early stopping'' progresivo por canales y sin necesidad de un set de validación.Les arquitectures de xarxes neuronals més avançades segueixen augmentant la seva mida i oferint resultats impressionants en noves dades a costa d'una escassa interpretabilitat. A les capes profundes d'aquests models ens trobem sovint amb espais de característiques de molt alta dimensió, en què la construcció de grafs a partir de representacions de dades intermèdies pot portar al conegut ''curse of dimensionality''. Proposem un mètode de construcció de grafs per canal que treballa en subespais de menor dimensió i proporciona una nova perspectiva basada en canals, que porta a una millor interpretabilitat de les dades i de la relació entre canals. A més, introduïm un nou estimador de generalització basat en el mètode de construcció de grafs proposat amb el qual realitzem interpolació local en polítops. Mostrem el seu potencial per substituir l'estimador de generalització estàndard basat en el rendiment en un set de validació independent per a realitzar ''early stopping'' progressiu per canals i sense necessitat d'un set de validació
SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models
Diffusion models have recently gained popularity for accelerated MRI
reconstruction due to their high sample quality. They can effectively serve as
rich data priors while incorporating the forward model flexibly at inference
time, and they have been shown to be more robust than unrolled methods under
distribution shifts. However, diffusion models require careful tuning of
inference hyperparameters on a validation set and are still sensitive to
distribution shifts during testing. To address these challenges, we introduce
SURE-based MRI Reconstruction with Diffusion models (SMRD), a method that
performs test-time hyperparameter tuning to enhance robustness during testing.
SMRD uses Stein's Unbiased Risk Estimator (SURE) to estimate the mean squared
error of the reconstruction during testing. SURE is then used to automatically
tune the inference hyperparameters and to set an early stopping criterion
without the need for validation tuning. To the best of our knowledge, SMRD is
the first to incorporate SURE into the sampling stage of diffusion models for
automatic hyperparameter selection. SMRD outperforms diffusion model baselines
on various measurement noise levels, acceleration factors, and anatomies,
achieving a PSNR improvement of up to 6 dB under measurement noise. The code is
publicly available at https://github.com/NVlabs/SMRD .Comment: MICCAI 202
A study of early stopping, ensembling, and patchworking for cascade correlation neural networks
The constructive topology of the cascade correlation algorithm makes it a popular choice for many researchers wishing to utilize neural networks. However, for multimodal problems, the mean squared error of the approximation increases significantly as the number of modes increases. The components of this error will comprise both bias and variance and we provide formulae for estimating these values from mean squared errors alone. We achieve a near threefold reduction in the overall error by using early stopping and ensembling. Also described is a new subdivision technique that we call patchworking. Patchworking, when used in combination with early stopping and ensembling, can achieve an order of magnitude improvement in the error. Also presented is an approach for validating the quality of a neural network’s training, without the explicit use of a testing dataset
- …