866 research outputs found
BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators
The parameters in Monte Carlo (MC) event generators are tuned on experimental measurements by evaluating the goodness of fit between the data and the MC predictions. The relative importance of each measurement is adjusted manually in an often time consuming, iterative process to meet different experimental needs. In this work, we introduce several optimization formulations and algorithms with new decision criteria for streamlining and automating this process. These algorithms are designed for two formulations: bilevel optimization and robust optimization. Both formulations are applied to the datasets used in the ATLAS A14 tune and to the dedicated hadronization datasets generated by the SHERPA generator, respectively. The corresponding tuned generator parameters are compared using three metrics. We compare the quality of our automatic tunes to the published ATLAS A14 tune. Moreover, we analyze the impact of a pre-processing step that excludes data that cannot be described by the physics models used in the MC event generators
LambdaOpt: Learn to Regularize Recommender Models in Finer Levels
Recommendation models mainly deal with categorical variables, such as
user/item ID and attributes. Besides the high-cardinality issue, the
interactions among such categorical variables are usually long-tailed, with the
head made up of highly frequent values and a long tail of rare ones. This
phenomenon results in the data sparsity issue, making it essential to
regularize the models to ensure generalization. The common practice is to
employ grid search to manually tune regularization hyperparameters based on the
validation data. However, it requires non-trivial efforts and large computation
resources to search the whole candidate space; even so, it may not lead to the
optimal choice, for which different parameters should have different
regularization strengths. In this paper, we propose a hyperparameter
optimization method, LambdaOpt, which automatically and adaptively enforces
regularization during training. Specifically, it updates the regularization
coefficients based on the performance of validation data. With LambdaOpt, the
notorious tuning of regularization hyperparameters can be avoided; more
importantly, it allows fine-grained regularization (i.e. each parameter can
have an individualized regularization coefficient), leading to better
generalized models. We show how to employ LambdaOpt on matrix factorization, a
classical model that is representative of a large family of recommender models.
Extensive experiments on two public benchmarks demonstrate the superiority of
our method in boosting the performance of top-K recommendation.Comment: Accepted by KDD 201
Automatic Data Augmentation Learning using Bilevel Optimization for Histopathological Images
Training a deep learning model to classify histopathological images is
challenging, because of the color and shape variability of the cells and
tissues, and the reduced amount of available data, which does not allow proper
learning of those variations. Variations can come from the image acquisition
process, for example, due to different cell staining protocols or tissue
deformation. To tackle this challenge, Data Augmentation (DA) can be used
during training to generate additional samples by applying transformations to
existing ones, to help the model become invariant to those color and shape
transformations. The problem with DA is that it is not only dataset-specific
but it also requires domain knowledge, which is not always available. Without
this knowledge, selecting the right transformations can only be done using
heuristics or through a computationally demanding search. To address this, we
propose an automatic DA learning method. In this method, the DA parameters,
i.e. the transformation parameters needed to improve the model training, are
considered learnable and are learned automatically using a bilevel optimization
approach in a quick and efficient way using truncated backpropagation. We
validated the method on six different datasets. Experimental results show that
our model can learn color and affine transformations that are more helpful to
train an image classifier than predefined DA transformations, which are also
more expensive as they need to be selected before the training by grid search
on a validation set. We also show that similarly to a model trained with
RandAugment, our model has also only a few method-specific hyperparameters to
tune but is performing better. This makes our model a good solution for
learning the best DA parameters, especially in the context of histopathological
images, where defining potentially useful transformation heuristically is not
trivial.Comment: arXiv admin note: text overlap with arXiv:2006.1469
A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method
Hyperparameter optimization in machine learning is often achieved using naive
techniques that only lead to an approximate set of hyperparameters. Although
techniques such as Bayesian optimization perform an intelligent search on a
given domain of hyperparameters, it does not guarantee an optimal solution. A
major drawback of most of these approaches is an exponential increase of their
search domain with number of hyperparameters, increasing the computational cost
and making the approaches slow. The hyperparameter optimization problem is
inherently a bilevel optimization task, and some studies have attempted bilevel
solution methodologies for solving this problem. However, these studies assume
a unique set of model weights that minimize the training loss, which is
generally violated by deep learning architectures. This paper discusses a
gradient-based bilevel method addressing these drawbacks for solving the
hyperparameter optimization problem. The proposed method can handle continuous
hyperparameters for which we have chosen the regularization hyperparameter in
our experiments. The method guarantees convergence to the set of optimal
hyperparameters that this study has theoretically proven. The idea is based on
approximating the lower-level optimal value function using Gaussian process
regression. As a result, the bilevel problem is reduced to a single level
constrained optimization task that is solved using the augmented Lagrangian
method. We have performed an extensive computational study on the MNIST and
CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that
confirms the efficiency of the proposed method. A comparative study against
grid search, random search, Bayesian optimization, and HyberBand method on
various hyperparameter problems shows that the proposed algorithm converges
with lower computation and leads to models that generalize better on the
testing set
CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization
The hyperparameter optimization of neural network can be expressed as a
bilevel optimization problem. The bilevel optimization is used to automatically
update the hyperparameter, and the gradient of the hyperparameter is the
approximate gradient based on the best response function. Finding the best
response function is very time consuming. In this paper we propose CPMLHO, a
new hyperparameter optimization method using cutting plane method and
mixed-level objective function.The cutting plane is added to the inner layer to
constrain the space of the response function. To obtain more accurate
hypergradient,the mixed-level can flexibly adjust the loss function by using
the loss of the training set and the verification set. Compared to existing
methods, the experimental results show that our method can automatically update
the hyperparameters in the training process, and can find more superior
hyperparameters with higher accuracy and faster convergence
BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization
Evolutionary reinforcement learning (ERL) algorithms recently raise attention
in tackling complex reinforcement learning (RL) problems due to high
parallelism, while they are prone to insufficient exploration or model collapse
without carefully tuning hyperparameters (aka meta-parameters). In the paper,
we propose a general meta ERL framework via bilevel optimization (BiERL) to
jointly update hyperparameters in parallel to training the ERL model within a
single agent, which relieves the need for prior domain knowledge or costly
optimization procedure before model deployment. We design an elegant meta-level
architecture that embeds the inner-level's evolving experience into an
informative population representation and introduce a simple and feasible
evaluation of the meta-level fitness function to facilitate learning
efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to
verify that as a general framework, BiERL outperforms various baselines and
consistently improves the learning performance for a diversity of ERL
algorithms.Comment: Published as a conference paper at European Conference on Artificial
Intelligence (ECAI) 202
- …