26 research outputs found
Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Hypernetworks, neural networks that predict the parameters of another neural
network, are powerful models that have been successfully used in diverse
applications from image generation to multi-task learning. Unfortunately,
existing hypernetworks are often challenging to train. Training typically
converges far more slowly than for non-hypernetwork models, and the rate of
convergence can be very sensitive to hyperparameter choices. In this work, we
identify a fundamental and previously unidentified problem that contributes to
the challenge of training hypernetworks: a magnitude proportionality between
the inputs and outputs of the hypernetwork. We demonstrate both analytically
and empirically that this can lead to unstable optimization, thereby slowing
down convergence, and sometimes even preventing any learning. We present a
simple solution to this problem using a revised hypernetwork formulation that
we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed
solution on several hypernetwork tasks, where it consistently stabilizes
training and achieves faster convergence. Furthermore, we perform a
comprehensive ablation study including choices of activation function,
normalization strategies, input dimensionality, and hypernetwork architecture;
and find that MIP improves training in all scenarios. We provide easy-to-use
code that can turn existing networks into MIP-based hypernetworks.Comment: Source code at https://github.com/JJGO/hyperligh
Scale-Space Hypernetworks for Efficient Biomedical Imaging
Convolutional Neural Networks (CNNs) are the predominant model used for a
variety of medical image analysis tasks. At inference time, these models are
computationally intensive, especially with volumetric data. In principle, it is
possible to trade accuracy for computational efficiency by manipulating the
rescaling factor in the downsample and upsample layers of CNN architectures.
However, properly exploring the accuracy-efficiency trade-off is prohibitively
expensive with existing models. To address this, we introduce Scale-Space
HyperNetworks (SSHN), a method that learns a spectrum of CNNs with varying
internal rescaling factors. A single SSHN characterizes an entire Pareto
accuracy-efficiency curve of models that match, and occasionally surpass, the
outcomes of training many separate networks with fixed rescaling factors. We
demonstrate the proposed approach in several medical image analysis
applications, comparing SSHN against strategies with both fixed and dynamic
rescaling factors. We find that SSHN consistently provides a better
accuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNs
enable the user to quickly choose a rescaling factor that appropriately
balances accuracy and computational efficiency for their particular needs at
inference.Comment: Code available at https://github.com/JJGO/scale-space-hypernetwork
Learning the Effect of Registration Hyperparameters with HyperMorph
We introduce HyperMorph, a framework that facilitates efficient
hyperparameter tuning in learning-based deformable image registration.
Classical registration algorithms perform an iterative pair-wise optimization
to compute a deformation field that aligns two images. Recent learning-based
approaches leverage large image datasets to learn a function that rapidly
estimates a deformation for a given image pair. In both strategies, the
accuracy of the resulting spatial correspondences is strongly influenced by the
choice of certain hyperparameter values. However, an effective hyperparameter
search consumes substantial time and human effort as it often involves training
multiple models for different fixed hyperparameter values and may lead to
suboptimal registration. We propose an amortized hyperparameter learning
strategy to alleviate this burden by learning the impact of hyperparameters on
deformation fields. We design a meta network, or hypernetwork, that predicts
the parameters of a registration network for input hyperparameters, thereby
comprising a single model that generates the optimal deformation field
corresponding to given hyperparameter values. This strategy enables fast,
high-resolution hyperparameter search at test-time, reducing the inefficiency
of traditional approaches while increasing flexibility. We also demonstrate
additional benefits of HyperMorph, including enhanced robustness to model
initialization and the ability to rapidly identify optimal hyperparameter
values specific to a dataset, image contrast, task, or even anatomical region,
all without the need to retrain models. We make our code publicly available at
http://hypermorph.voxelmorph.net.Comment: Accepted for publication at the Journal of Machine Learning for
Biomedical Imaging (MELBA) at https://www.melba-journal.or
End-to-end Kernel Learning via Generative Random Fourier Features
Random Fourier features (RFFs) provide a promising way for kernel learning in
a spectral case. Current RFFs-based kernel learning methods usually work in a
two-stage way. In the first-stage process, learning the optimal feature map is
often formulated as a target alignment problem, which aims to align the learned
kernel with the pre-defined target kernel (usually the ideal kernel). In the
second-stage process, a linear learner is conducted with respect to the mapped
random features. Nevertheless, the pre-defined kernel in target alignment is
not necessarily optimal for the generalization of the linear learner. Instead,
in this paper, we consider a one-stage process that incorporates the kernel
learning and linear learner into a unifying framework. To be specific, a
generative network via RFFs is devised to implicitly learn the kernel, followed
by a linear classifier parameterized as a full-connected layer. Then the
generative network and the classifier are jointly trained by solving the
empirical risk minimization (ERM) problem to reach a one-stage solution. This
end-to-end scheme naturally allows deeper features, in correspondence to a
multi-layer structure, and shows superior generalization performance over the
classical two-stage, RFFs-based methods in real-world classification tasks.
Moreover, inspired by the randomized resampling mechanism of the proposed
method, its enhanced adversarial robustness is investigated and experimentally
verified.Comment: update revised versio
Meta-Learning in Neural Networks: A Survey
The field of meta-learning, or learning-to-learn, has seen a dramatic rise in
interest in recent years. Contrary to conventional approaches to AI where tasks
are solved from scratch using a fixed learning algorithm, meta-learning aims to
improve the learning algorithm itself, given the experience of multiple
learning episodes. This paradigm provides an opportunity to tackle many
conventional challenges of deep learning, including data and computation
bottlenecks, as well as generalization. This survey describes the contemporary
meta-learning landscape. We first discuss definitions of meta-learning and
position it with respect to related fields, such as transfer learning and
hyperparameter optimization. We then propose a new taxonomy that provides a
more comprehensive breakdown of the space of meta-learning methods today. We
survey promising applications and successes of meta-learning such as few-shot
learning and reinforcement learning. Finally, we discuss outstanding challenges
and promising areas for future research
Recommended from our members
Autogenerative Networks
Artificial intelligence powered by deep neural networks has seen tremendous improvements in the last decade, achieving superhuman performance on a diverse range of tasks. Many worry that it can one day develop the ability to recursively self-improve itself, leading to an intelligence explosion known as the Singularity. Autogenerative networks, or neural networks generating neural networks, is one major plausible pathway towards realizing this possibility. The object of this thesis is to study various challenges and applications of small-scale autogenerative networks in domains such as artificial life, reinforcement learning, neural network initialization and optimization, gradient-based meta-learning, and logical networks. Chapters 2 and 3 describe novel mechanisms for generating neural network weights and embeddings. Chapters 4 and 5 identify problems and propose solutions to fix optimization difficulties in differentiable mechanisms of neural network generation known as Hypernetworks. Chapters 6 and 7 study implicit models of network generation like backpropagating through gradient descent itself and integrating discrete solvers into continuous functions. Together, the chapters in this thesiscontribute novel proposals for non-differentiable neural network generation mechanisms, significant improvements to existing differentiable network generation mechanisms, and an assimilation of different learning paradigms in autogenerative networks
Recommended from our members
Uncertainty in Deep Learning with Implicit Neural Networks
The ability to extract uncertainties from predictions is crucial for the adoption of deep learning systems to safety-critical applications. Uncertainty estimates can be used as a failure signal, which is necessary for automating complex tasks where safety is a concern. Furthermore, current deep learning systems do not provide uncertainty estimates, and instead can assign high probability to incorrect predictions. To mitigate this problem of overconfidence, this dissertation proposes three approaches that leverage the uncertainty within a distribution of models. Specifically, we consider the epistemic uncertainty given by an approximation to the posterior over model parameters. Prior work approximates this posterior by utilizing analytically known distributions, which are inflexible and result in underestimation of the uncertainty. Instead, we propose to use implicit distributions, which are computationally efficient to sample from, and are flexible enough to parameterize a wide range of distributions. The contributions of this thesis show that implicit models enable better uncertainty estimates than prior work, and can be used for open-category prediction, adversarial example detection, and exploration in reinforcement learning.
We begin by showing that implicit generative models with feature-space regularization can be used in the open-category setting to detect input distribution shift, while retaining accuracy on training data. Next, we refine our approach by explicitly encouraging diversity within samples with particle-based variational inference. The uncertainty given by these diverse models is used for exploration in reinforcement learning. We show that
in the model-based setting we can leverage uncertainty as a novelty signal, compelling exploration to poorly understood areas of the environment. Third, we turn to the fundamental problem of approximate Bayesian inference. We develop a framework for generative particle-based variational inference that allows for efficient sampling, places no restrictions on the approximate posterior, and improves our ability to estimate epistemic uncertainty
Recommended from our members
Methods for Detection and Recovery of Out-of-Distribution Examples
Deep neural networks currently comprise the backbone of many applications where safety is a critical concern, for example: autonomous driving and medical diagnostics. Unfortunately these systems currently fail to detect out-of-distribution (OOD) inputs and can be prone to making dangerous errors when exposed to them. In addition, these same systems are vulnerable to maliciously altered inputs called adversarial examples. In response to these problems we present two methods to handle out-of-distribution inputs, as well resist adversarial examples, respectively. \\
To detect OOD inputs, we introduce HyperGAN: a generative adversarial network which learns to generate all the parameters of a deep neural network. HyperGAN first transforms low dimensional noise into a latent space, which can be sampled from to obtain diverse, performant sets of parameters for a target architecture. By sampling many sets of parameters, we form a diverse ensemble which provides a better estimate of uncertainty than standard ensembles. We show that HyperGAN can reliably detect OOD inputs as well as adversarial examples.\\
We also present a method for recovering clean images from adversarial examples. BFNet uses a differentiable bilateral filter as a preprocessor to a neural network. The bilateral filter projects inputs back to the space of natural images, and in doing so it removes the adversarial perturbation. We show that BFNet is an effective defense in multiple attack settings, and is able to provide additional robustness when combined with other defenses
Advancing and Leveraging Tractable Likelihood Models
The past decade has seen a remarkable improvement in a variety of machine learning applications thanks to numerous advances in deep neural networks (DNN). These models are now the de facto standard in fields ranging from image/speech recognition to driverless cars and have begun to permeate aspects of modern science and everyday life. The deep learning revolution has also resulted in highly effective generative models such as score matching models, diffusion models, VAEs, GANs, and tractable likelihood models. These models are best known for their ability to create novel samples of impressive quality but are usually limited to highly structured data modalities. Expanding the capabilities and applications of likelihood models beyond conventional data formats and generative applications can increase functionality, interpretability, and intuition compared to conventional methods. This dissertation addresses shortcomings in likelihood models over less structured data and explores methods to exploit a learned density as part of a larger application. We begin by advancing the performance of likelihood models outside the standard, ordered data regime by developing methods that are applicable to sets, e.g., point clouds. Many data sources contain instances that are a collection of unordered points, such as points on the surface of scans from human organs, sets of images from a web page, or LiDAR observations commonly used in driverless cars or (hyper-spectral) aerial surveys.We then explore several applications of density models. First, we consider generative process over neural networks themselves and show that training over ensembles of these sampled models can lead to improved robustness to adversarial attacks. Next, we demonstrate how to use the transformative portion of a normalizing flow as a feature extractor in conjunction with a downstream task to estimate expectations over model performance in local and global regions.Finally, we propose a learnable, continuous parameterization of mixture models directly on the input space to improve model interpretability while simultaneously allowing for arbitrary marginalization or conditioning without the need to train new models or develop complex masking mechanisms.Doctor of Philosoph