168 research outputs found
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
A Framework for Controllable Pareto Front Learning with Completed Scalarization Functions and its Applications
Pareto Front Learning (PFL) was recently introduced as an efficient method
for approximating the entire Pareto front, the set of all optimal solutions to
a Multi-Objective Optimization (MOO) problem. In the previous work, the mapping
between a preference vector and a Pareto optimal solution is still ambiguous,
rendering its results. This study demonstrates the convergence and completion
aspects of solving MOO with pseudoconvex scalarization functions and combines
them into Hypernetwork in order to offer a comprehensive framework for PFL,
called Controllable Pareto Front Learning. Extensive experiments demonstrate
that our approach is highly accurate and significantly less computationally
expensive than prior methods in term of inference time.Comment: Under Review at Neural Networks Journa
Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks
Meta-learning has recently been an emerging data-efficient learning technique
for various medical imaging operations and has helped advance contemporary deep
learning models. Furthermore, meta-learning enhances the knowledge
generalization of the imaging tasks by learning both shared and discriminative
weights for various configurations of imaging tasks. However, existing
meta-learning models attempt to learn a single set of weight initializations of
a neural network that might be restrictive for multimodal data. This work aims
to develop a multimodal meta-learning model for image reconstruction, which
augments meta-learning with evolutionary capabilities to encompass diverse
acquisition settings of multimodal data. Our proposed model called KM-MAML
(Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that
evolve to generate mode-specific weights. These weights provide the
mode-specific inductive bias for multiple modes by re-calibrating each kernel
of the base network for image reconstruction via a low-rank kernel modulation
operation. We incorporate gradient-based meta-learning (GBML) in the contextual
space to update the weights of the hypernetworks for different modes. The
hypernetworks and the reconstruction network in the GBML setting provide
discriminative mode-specific features and low-level image features,
respectively. Experiments on multi-contrast MRI reconstruction show that our
model, (i) exhibits superior reconstruction performance over joint training,
other meta-learning methods, and context-specific MRI reconstruction methods,
and (ii) better adaptation capabilities with improvement margins of 0.5 dB in
PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that
kernel modulation infuses 80% of mode-specific representation changes in the
high-resolution layers. Our source code is available at
https://github.com/sriprabhar/KM-MAML/.Comment: Accepted for publication in Elsevier Applied Soft Computing Journal,
36 pages, 18 figure
Automated deep learning architecture design using differentiable architecture search (DARTS)
2019 Fall.Includes bibliographical references.Creating neural networks by hand is a slow trial-and-error based process. Designing new architectures similar to GoogleNet or FractalNets, which use repeated tree-based structures, is highly likely to be inefficient and sub-optimal because of the large number of possibilities for composing such structures. Recently, neural architecture search algorithms have been able to automate the process of architecture design and have often attained state-of-the-art performances on CIFAR-10, ImageNet and Penn Tree Bank datasets. Even though the search time has been reduced to tens of GPU hours from tens of thousands of GPU hours, most search algorithms rely on additional controllers and hypernetworks to generate architecture encoding or predict weights for sampled architectures. These controllers and hypernetworks might require optimal structure when deployed on a new task on a new dataset. And since this is done by hand, the problem of architecture search is not really solved. Differentiable Architecture Search (DARTS) avoids this problem by using gradient descent methods. In this work, the DARTS algorithm is studied under various conditions and search hyperparameters. DARTS is applied to CIFAR-10 to check reproducibility of the original results. It is also tested in a new setting — on the CheXpert dataset — to discover new architectures and is compared to a baseline DenseNet121 model. The architectures searched using DARTS achieve better performance on the validation set than the baseline model
Generating Behaviorally Diverse Policies with Latent Diffusion Models
Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has
enabled learning a collection of behaviorally diverse, high performing
policies. However, these methods typically involve storing thousands of
policies, which results in high space-complexity and poor scaling to additional
behaviors. Condensing the archive into a single model while retaining the
performance and coverage of the original collection of policies has proved
challenging. In this work, we propose using diffusion models to distill the
archive into a single generative model over policy parameters. We show that our
method achieves a compression ratio of 13x while recovering 98% of the original
rewards and 89% of the original coverage. Further, the conditioning mechanism
of diffusion models allows for flexibly selecting and sequencing behaviors,
including using language. Project website:
https://sites.google.com/view/policydiffusion/hom
- …