28 research outputs found
Order-Preserving GFlowNets
Generative Flow Networks (GFlowNets) have been introduced as a method to
sample a diverse set of candidates with probabilities proportional to a given
reward. However, GFlowNets can only be used with a predefined scalar reward,
which can be either computationally expensive or not directly accessible, in
the case of multi-objective optimization (MOO) tasks for example. Moreover, to
prioritize identifying high-reward candidates, the conventional practice is to
raise the reward to a higher exponent, the optimal choice of which may vary
across different environments. To address these issues, we propose
Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in
proportion to a learned reward function that is consistent with a provided
(partial) order on the candidates, thus eliminating the need for an explicit
formulation of the reward function. We theoretically prove that the training
process of OP-GFNs gradually sparsifies the learned reward landscape in
single-objective maximization tasks. The sparsification concentrates on
candidates of a higher hierarchy in the ordering, ensuring exploration at the
beginning and exploitation towards the end of the training. We demonstrate
OP-GFN's state-of-the-art performance in single-objective maximization (totally
ordered) and multi-objective Pareto front approximation (partially ordered)
tasks, including synthetic datasets, molecule generation, and neural
architecture search.Comment: ICLR 202
Least-squares based layerwise pruning of Deep Neural Networks
Tiefe Neuronale Netze (DNNs) sind derzeit die leistungsstärksten Modelle im Bereich des maschinellen Lernens und lösen erfolgreich viele Aufgaben, wie zum Beispiel Bild- und Spracherkennung, semantische Segmentierung oder Datengenerierung. Aufgrund der inhärent hohen Rechenkomplexität von DNNs wurden schon früh Pruningverfahren angewandt um die Rechenkomplexität von DNNs zu reduzieren und um die Inferenz zu beschleunigen. Pruningverfahren entfernen (prunen) Parameter aus einem trainierten DNN, ohne ihre Leistung dadurch signifikant zu beeinträchtigen. Die dadurch erhaltenen Modelle können auch auf schwachen Rechenplattformen mit hoher Geschwindigkeit ausgewertet werden. In den letzten Jahren wurden Pruningverfahren nicht nur nach dem Training, sondern auch als Bestandteil von modernen Trainingsalgorithmen für DNNs eingesetzt. So wenden zum Beispiel viele speichereffiziente Trainingsalgorithmen oder Architektursuchverfahren pruning schon während des Trainings an, um unwichtige Parameter aus dem DNN zu entfernen. Problematisch ist, dass viele moderne Pruningverfahren auf regularisierten, überwachten Trainingverfahren beruhen und daher selbst sehr rechenaufwändig sind. Solche Pruningverfahren können nicht ohne Weiteres in andere Trainingsalgorithmen eingebettet werden. Es besteht daher ein wachsendes Interesse an Pruningmethoden, die sowohl schnell als auch genau sind. In dieser Arbeit untersuchen wir das layerbasierte Least-Squares (LS) Pruning – ein Framework für das strukturierte Pruning von DNNs. Wir zeigen, dass LS-Pruning eine schnelleund dennoch genaue Methode für die DNN-reduktion ist, die für Zero-Shot oder für die unüberwachte Netzwerkreduktion verwendet werden kann. In experimenten vergleichen wir LS-Pruning mit anderen schnellen Reduktionsmethoden, wie zum Beispiel dem magnitudenbasierten Pruning und der LS-Faktorisierung. Darüber hinaus vergleichen wir LS-Pruning mit überwachten Pruningverfahren
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation
Computing gradients of an expectation with respect to the distributional
parameters of a discrete distribution is a problem arising in many fields of
science and engineering. Typically, this problem is tackled using Reinforce,
which frames the problem of gradient estimation as a Monte Carlo simulation.
Unfortunately, the Reinforce estimator is especially sensitive to discrepancies
between the true probability distribution and the drawn samples, a common issue
in low sampling regimes that results in inaccurate gradient estimates. In this
paper, we introduce DBsurf, a reinforce-based estimator for discrete
distributions that uses a novel sampling procedure to reduce the discrepancy
between the samples and the actual distribution. To assess the performance of
our estimator, we subject it to a diverse set of tasks. Among existing
estimators, DBsurf attains the lowest variance in a least squares problem
commonly used in the literature for benchmarking. Furthermore, DBsurf achieves
the best results for training variational auto-encoders (VAE) across different
datasets and sampling setups. Finally, we apply DBsurf to build a simple and
efficient Neural Architecture Search (NAS) algorithm with state-of-the-art
performance.Comment: 22 pages, 7 figure
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models
In recent years, the rapid evolution of computer vision has seen the
emergence of various foundation models, each tailored to specific data types
and tasks. In this study, we explore the adaptation of these models for
few-shot semantic segmentation. Specifically, we conduct a comprehensive
comparative analysis of four prominent foundation models: DINO V2, Segment
Anything, CLIP, Masked AutoEncoders, and of a straightforward ResNet50
pre-trained on the COCO dataset. We also include 5 adaptation methods, ranging
from linear probing to fine tuning. Our findings show that DINO V2 outperforms
other models by a large margin, across various datasets and adaptation methods.
On the other hand, adaptation methods provide little discrepancy in the
obtained results, suggesting that a simple linear probing can compete with
advanced, more computationally intensive, alternative
Iteratively Training Look-Up Tables for Network Quantization
Operating deep neural networks (DNNs) on devices with limited resources
requires the reduction of their memory as well as computational footprint.
Popular reduction methods are network quantization or pruning, which either
reduce the word length of the network parameters or remove weights from the
network if they are not needed. In this article we discuss a general framework
for network reduction which we call `Look-Up Table Quantization` (LUT-Q). For
each layer, we learn a value dictionary and an assignment matrix to represent
the network weights. We propose a special solver which combines gradient
descent and a one-step k-means update to learn both the value dictionaries and
assignment matrices iteratively. This method is very flexible: by constraining
the value dictionary, many different reduction problems such as non-uniform
network quantization, training of multiplierless networks, network pruning or
simultaneous quantization and pruning can be implemented without changing the
solver. This flexibility of the LUT-Q method allows us to use the same method
to train networks for different hardware capabilities.Comment: Copyright 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
A Statistical Model for Predicting Generalization in Few-Shot Classification
The estimation of the generalization error of classifiers often relies on a
validation set. Such a set is hardly available in few-shot learning scenarios,
a highly disregarded shortcoming in the field. In these scenarios, it is common
to rely on features extracted from pre-trained neural networks combined with
distance-based classifiers such as nearest class mean. In this work, we
introduce a Gaussian model of the feature distribution. By estimating the
parameters of this model, we are able to predict the generalization error on
new classification tasks with few samples. We observe that accurate distance
estimates between class-conditional densities are the key to accurate estimates
of the generalization performance. Therefore, we propose an unbiased estimator
for these distances and integrate it in our numerical analysis. We show that
our approach outperforms alternatives such as the leave-one-out
cross-validation strategy in few-shot settings