8,493 research outputs found
Investigating benchmark correlations when comparing algorithms with parameter tuning.
Benchmarks are important for comparing performance of optimisation algorithms, but we can select instances that present our algorithm favourably, and dismiss those on which our algorithm under-performs. Also related are automated design of algorithms, which use problem instances (benchmarks) to train an algorithm: careful choice of instances is needed for the algorithm to generalise. We sweep parameter settings of differential evolution to applied to the BBOB benchmarks. Several benchmark functions are highly correlated. This may lead to the false conclusion that an algorithm performs well in general, when it performs poorly on a few key instances. These correlations vary with the number of evaluations
Investigating benchmark correlations when comparing algorithms with parameter tuning: detailed experiments and results.
Benchmarks are important to demonstrate the utility of optimisation algorithms, but there is controversy about the practice of benchmarking; we could select instances that present our algorithm favourably, and dismiss those on which our algorithm underperforms. Several papers highlight the pitfalls concerned with benchmarking, some of which concern the context of the automated design of algorithms, where we use a set of problem instances (benchmarks) to train our algorithm. As with machine learning, if the training set does not reflect the test set, the algorithm will not generalize. This raises some open questions concerning the use of test instances to automatically design algorithms. We use differential evolution and sweep the parameter settings to investigate the practice of benchmarking using the BBOB benchmarks. We make three key findings. Firstly, several benchmark functions are highly correlated. This may lead to the false conclusion that an algorithm performs well in general, when it performs poorly on a few key instances, possibly introducing unwanted bias to a resulting automatically designed algorithm. Secondly, the number of evaluations can have a large effect on the conclusion. Finally, a systematic sweep of the parameters shows how performance varies with time across the space of algorithm configurations. The datasets, including all computed features, the evolved policies and their performances, and the visualisations for all feature sets are available from the University of Stirling Data Repository (http://hdl.handle.net/11667/109)
A hybrid swarm-based algorithm for single-objective optimization problems involving high-cost analyses
In many technical fields, single-objective optimization procedures in
continuous domains involve expensive numerical simulations. In this context, an
improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial
super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide
fast convergence speed, high solution accuracy and robust performance over a
wide range of problems. It implements enhancements of the ABC structure and
hybridizations with interpolation strategies. The latter are inspired by the
quadratic trust region approach for local investigation and by an efficient
global optimizer for separable problems. Each modification and their combined
effects are studied with appropriate metrics on a numerical benchmark, which is
also used for comparing AsBeC with some effective ABC variants and other
derivative-free algorithms. In addition, the presented algorithm is validated
on two recent benchmarks adopted for competitions in international conferences.
Results show remarkable competitiveness and robustness for AsBeC.Comment: 19 pages, 4 figures, Springer Swarm Intelligenc
Informative Path Planning for Active Field Mapping under Localization Uncertainty
Information gathering algorithms play a key role in unlocking the potential
of robots for efficient data collection in a wide range of applications.
However, most existing strategies neglect the fundamental problem of the robot
pose uncertainty, which is an implicit requirement for creating robust,
high-quality maps. To address this issue, we introduce an informative planning
framework for active mapping that explicitly accounts for the pose uncertainty
in both the mapping and planning tasks. Our strategy exploits a Gaussian
Process (GP) model to capture a target environmental field given the
uncertainty on its inputs. For planning, we formulate a new utility function
that couples the localization and field mapping objectives in GP-based mapping
scenarios in a principled way, without relying on any manually tuned
parameters. Extensive simulations show that our approach outperforms existing
strategies, with reductions in mean pose uncertainty and map error. We also
present a proof of concept in an indoor temperature mapping scenario.Comment: 8 pages, 7 figures, submission (revised) to Robotics & Automation
Letters (and IEEE International Conference on Robotics and Automation
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
A Holistic Assessment of the Reliability of Machine Learning Systems
As machine learning (ML) systems increasingly permeate high-stakes settings
such as healthcare, transportation, military, and national security, concerns
regarding their reliability have emerged. Despite notable progress, the
performance of these systems can significantly diminish due to adversarial
attacks or environmental changes, leading to overconfident predictions,
failures to detect input faults, and an inability to generalize in unexpected
scenarios. This paper proposes a holistic assessment methodology for the
reliability of ML systems. Our framework evaluates five key properties:
in-distribution accuracy, distribution-shift robustness, adversarial
robustness, calibration, and out-of-distribution detection. A reliability score
is also introduced and used to assess the overall system reliability. To
provide insights into the performance of different algorithmic approaches, we
identify and categorize state-of-the-art techniques, then evaluate a selection
on real-world tasks using our proposed reliability metrics and reliability
score. Our analysis of over 500 models reveals that designing for one metric
does not necessarily constrain others but certain algorithmic techniques can
improve reliability across multiple metrics simultaneously. This study
contributes to a more comprehensive understanding of ML reliability and
provides a roadmap for future research and development
Finding network communities using modularity density
Many real-world complex networks exhibit a community structure, in which the modules correspond to actual functional units. Identifying these communities is a key challenge for scientists. A common approach is to search for the network partition that maximizes a quality function. Here, we present a detailed analysis of a recently proposed function, namely modularity density. We show that it does not incur in the drawbacks suffered by traditional modularity, and that it can identify networks without ground-truth community structure, deriving its analytical dependence on link density in generic random graphs. In addition, we show that modularity density allows an easy comparison between networks of different sizes, and we also present some limitations that methods based on modularity density may suffer from. Finally, we introduce an efficient, quadratic community detection algorithm based on modularity density maximization, validating its accuracy against theoretical predictions and on a set of benchmark networks
Futility Analysis in the Cross-Validation of Machine Learning Models
Many machine learning models have important structural tuning parameters that
cannot be directly estimated from the data. The common tactic for setting these
parameters is to use resampling methods, such as cross--validation or the
bootstrap, to evaluate a candidate set of values and choose the best based on
some pre--defined criterion. Unfortunately, this process can be time consuming.
However, the model tuning process can be streamlined by adaptively resampling
candidate values so that settings that are clearly sub-optimal can be
discarded. The notion of futility analysis is introduced in this context. An
example is shown that illustrates how adaptive resampling can be used to reduce
training time. Simulation studies are used to understand how the potential
speed--up is affected by parallel processing techniques.Comment: 22 pages, 5 figure
- …