173 research outputs found
Meta-learning for data summarization based on instance selection method
The purpose of instance selection is to identify which instances (examples, patterns) in a large dataset should be selected as representatives of the entire dataset, without significant loss of information. When a machine learning method is applied to the reduced dataset, the accuracy of the model should not be significantly worse than if the same method were applied to the entire dataset. The reducibility of any dataset, and hence the success of instance selection methods, surely depends on the characteristics of the dataset, as well as the machine learning method. This paper adopts a meta-learning approach, via an empirical study of 112 classification datasets from the UCI Repository [1], to explore the relationship between data characteristics, machine learning methods, and the success of instance selection method.<br /
Towards objective measures of algorithm performance across instance space
This paper tackles the difficult but important task of objective algorithm performance assessment for optimization. Rather than reporting average performance of algorithms across a set of chosen instances, which may bias conclusions, we propose a methodology to enable the strengths and weaknesses of different optimization algorithms to be compared across a broader instance space. The results reported in a recent Computers and Operations Research paper comparing the performance of graph coloring heuristics are revisited with this new methodology to demonstrate (i) how pockets of the instance space can be found where algorithm performance varies significantly from the average performance of an algorithm; (ii) how the properties of the instances can be used to predict algorithm performance on previously unseen instances with high accuracy; and (iii) how the relative strengths and weaknesses of each algorithm can be visualized and measured objectively
Characterising harmful data sources when constructing multi-fidelity surrogate models
Surrogate modelling techniques have seen growing attention in recent years
when applied to both modelling and optimisation of industrial design problems.
These techniques are highly relevant when assessing the performance of a
particular design carries a high cost, as the overall cost can be mitigated via
the construction of a model to be queried in lieu of the available high-cost
source. The construction of these models can sometimes employ other sources of
information which are both cheaper and less accurate. The existence of these
sources however poses the question of which sources should be used when
constructing a model. Recent studies have attempted to characterise harmful
data sources to guide practitioners in choosing when to ignore a certain
source. These studies have done so in a synthetic setting, characterising
sources using a large amount of data that is not available in practice. Some of
these studies have also been shown to potentially suffer from bias in the
benchmarks used in the analysis. In this study, we present a characterisation
of harmful low-fidelity sources using only the limited data available to train
a surrogate model. We employ recently developed benchmark filtering techniques
to conduct a bias-free assessment, providing objectively varied benchmark
suites of different sizes for future research. Analysing one of these benchmark
suites with the technique known as Instance Space Analysis, we provide an
intuitive visualisation of when a low-fidelity source should be used and use
this analysis to provide guidelines that can be used in an applied industrial
setting
Instance Space Analysis of Search-Based Software Testing
Search-based software testing (SBST) is now a mature area, with numerous
techniques developed to tackle the challenging task of software testing. SBST
techniques have shown promising results and have been successfully applied in
the industry to automatically generate test cases for large and complex
software systems. Their effectiveness, however, is problem-dependent. In this
paper, we revisit the problem of objective performance evaluation of SBST
techniques considering recent methodological advances -- in the form of
Instance Space Analysis (ISA) -- enabling the strengths and weaknesses of SBST
techniques to be visualized and assessed across the broadest possible space of
problem instances (software classes) from common benchmark datasets. We
identify features of SBST problems that explain why a particular instance is
hard for an SBST technique, reveal areas of hard and easy problems in the
instance space of existing benchmark datasets, and identify the strengths and
weaknesses of state-of-the-art SBST techniques. In addition, we examine the
diversity and quality of common benchmark datasets used in experimental
evaluations
Adaptive fusion of gait and face for human identification in video
Most work on multi-biometric fusion is based on static fusion rules which cannot respond to the changes of the environment and the individual users. This paper proposes adaptive multi-biometric fusion, which dynamically adjusts the fusion rules to suit the real-time external conditions. As a typical example, the adaptive fusion of gait and face in video is studied. Two factors that may affect the relationship between gait and face in the fusion are considered, i.e., the view angle and the subject-to-camera distance. Together they determine the way gait and face are fused at an arbitrary time. Experimental results show that the adaptive fusion performs significantly better than not only single biometric traits, but also those widely adopted static fusion rules including SUM, PRODUCT, MIN, and MAX.<br /
Automatic age estimation based on facial aging patterns
While recognition of most facial variations, such as identity, expression, and gender, has been extensively studied, automatic age estimation has rarely been explored. In contrast to other facial variations, aging variation presents several unique characteristics which make age estimation a challenging task. This paper proposes an automatic age estimation method named AGES (AGing pattErn Subspace). The basic idea is to model the aging pattern, which is defined as the sequence of a particular individual\u27s face images sorted in time order, by constructing a representative subspace. The proper aging pattern for a previously unseen face image is determined by the projection in the subspace that can reconstruct the face image with minimum reconstruction error, while the position of the face image in that aging pattern will then indicate its age. In the experiments, AGES and its variants are compared with the limited existing age estimation methods (WAS and AAS) and some well-established classification methods (kNN, BP, C4.5, and SVM). Moreover, a comparison with human perception ability on age is conducted. It is interesting to note that the performance of AGES is not only significantly better than that of all the other algorithms, but also comparable to that of the human observers.<br /
- …