395 research outputs found
Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution
It is not uncommon that meta-heuristic algorithms contain some intrinsic
parameters, the optimal configuration of which is crucial for achieving their
peak performance. However, evaluating the effectiveness of a configuration is
expensive, as it involves many costly runs of the target algorithm. Perhaps
surprisingly, it is possible to build a cheap-to-evaluate surrogate that models
the algorithm's empirical performance as a function of its parameters. Such
surrogates constitute an important building block for understanding algorithm
performance, algorithm portfolio/selection, and the automatic algorithm
configuration. In principle, many off-the-shelf machine learning techniques can
be used to build surrogates. In this paper, we take the differential evolution
(DE) as the baseline algorithm for proof-of-concept study. Regression models
are trained to model the DE's empirical performance given a parameter
configuration. In particular, we evaluate and compare four popular regression
algorithms both in terms of how well they predict the empirical performance
with respect to a particular parameter configuration, and also how well they
approximate the parameter versus the empirical performance landscapes
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Global covariance pooling in convolutional neural networks has achieved
impressive improvement over the classical first-order pooling. Recent works
have shown matrix square root normalization plays a central role in achieving
state-of-the-art performance. However, existing methods depend heavily on
eigendecomposition (EIG) or singular value decomposition (SVD), suffering from
inefficient training due to limited support of EIG and SVD on GPU. Towards
addressing this problem, we propose an iterative matrix square root
normalization method for fast end-to-end training of global covariance pooling
networks. At the core of our method is a meta-layer designed with loop-embedded
directed graph structure. The meta-layer consists of three consecutive
nonlinear structured layers, which perform pre-normalization, coupled matrix
iteration and post-compensation, respectively. Our method is much faster than
EIG or SVD based ones, since it involves only matrix multiplications, suitable
for parallel implementation on GPU. Moreover, the proposed network with ResNet
architecture can converge in much less epochs, further accelerating network
training. On large-scale ImageNet, we achieve competitive performance superior
to existing counterparts. By finetuning our models pre-trained on ImageNet, we
establish state-of-the-art results on three challenging fine-grained
benchmarks. The source code and network models will be available at
http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201
Nonlinear Friction-Induced Vibration of a Slider-Belt System
A mass–spring–damper slider excited into vibration in a plane by a moving rigid belt through friction is a major paradigm of friction-induced vibration. This paradigm has two aspects that can be improved: (1) the contact stiffness at the slider–belt interface is often assumed to be linear and (2) this contact is usually assumed to be maintained during vibration (even when the vibration becomes unbounded at certain conditions). In this paper, a cubic contact spring is included; loss of contact (separation) at the slider–belt interface is allowed and importantly reattachment of the slider to the belt after separation is also considered. These two features make a more realistic model of friction-induced vibration and are shown to lead to very rich dynamic behavior even though a simple Coulomb friction law is used. Both complex eigenvalue analyses of the linearized system and transient analysis of the full nonlinear system are conducted. Eigenvalue analysis indicates that the nonlinear system can become unstable at increasing levels of the preload and the nonlinear stiffness, even if the corresponding linear part of the system is stable. However, they at a high enough level become stabilizing factors. Transient analysis shows that separation and reattachment could happen. Vibration can grow with the preload and vertical nonlinear stiffness when separation is considered, while this trend is different when separation is ignored. Finally, it is found that the vibration magnitudes of the model with separation are greater than the corresponding model without considering separation in certain conditions. Thus, ignoring the separation is unsafe.</jats:p
BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction
Cross-Project Defect Prediction (CPDP), which borrows data from similar
projects by combining a transfer learner with a classifier, have emerged as a
promising way to predict software defects when the available data about the
target project is insufficient. How-ever, developing such a model is challenge
because it is difficult to determine the right combination of transfer learner
and classifier along with their optimal hyper-parameter settings. In this
paper, we propose a tool, dubbedBiLO-CPDP, which is the first of its kind to
formulate the automated CPDP model discovery from the perspective of bi-level
programming. In particular, the bi-level programming proceeds the optimization
with two nested levels in a hierarchical manner. Specifically, the upper-level
optimization routine is designed to search for the right combination of
transfer learner and classifier while the nested lower-level optimization
routine aims to optimize the corresponding hyper-parameter settings.To
evaluateBiLO-CPDP, we conduct experiments on 20 projects to compare it with a
total of 21 existing CPDP techniques, along with its single-level optimization
variant and Auto-Sklearn, a state-of-the-art automated machine learning tool.
Empirical results show that BiLO-CPDP champions better prediction performance
than all other 21 existing CPDP techniques on 70% of the projects, while be-ing
overwhelmingly superior to Auto-Sklearn and its single-level optimization
variant on all cases. Furthermore, the unique bi-level formalization
inBiLO-CPDP also permits to allocate more budget to the upper-level, which
significantly boosts the performance
Normalization Enhances Generalization in Visual Reinforcement Learning
Recent advances in visual reinforcement learning (RL) have led to impressive
success in handling complex tasks. However, these methods have demonstrated
limited generalization capability to visual disturbances, which poses a
significant challenge for their real-world application and adaptability. Though
normalization techniques have demonstrated huge success in supervised and
unsupervised learning, their applications in visual RL are still scarce. In
this paper, we explore the potential benefits of integrating normalization into
visual RL methods with respect to generalization performance. We find that,
perhaps surprisingly, incorporating suitable normalization techniques is
sufficient to enhance the generalization capabilities, without any additional
special design. We utilize the combination of two normalization techniques,
CrossNorm and SelfNorm, for generalizable visual RL. Extensive experiments are
conducted on DMControl Generalization Benchmark and CARLA to validate the
effectiveness of our method. We show that our method significantly improves
generalization capability while only marginally affecting sample efficiency. In
particular, when integrated with DrQ-v2, our method enhances the test
performance of DrQ-v2 on CARLA across various scenarios, from 14% of the
training performance to 97%
- …