7,656 research outputs found
Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm
This paper introduces ICET, a new algorithm for cost-sensitive
classification. ICET uses a genetic algorithm to evolve a population of biases
for a decision tree induction algorithm. The fitness function of the genetic
algorithm is the average cost of classification when using the decision tree,
including both the costs of tests (features, measurements) and the costs of
classification errors. ICET is compared here with three other algorithms for
cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5,
which classifies without regard to cost. The five algorithms are evaluated
empirically on five real-world medical datasets. Three sets of experiments are
performed. The first set examines the baseline performance of the five
algorithms on the five datasets and establishes that ICET performs
significantly better than its competitors. The second set tests the robustness
of ICET under a variety of conditions and shows that ICET maintains its
advantage. The third set looks at ICET's search in bias space and discovers a
way to improve the search.Comment: See http://www.jair.org/ for any accompanying file
A Classification of Hyper-heuristic Approaches
The current state of the art in hyper-heuristic research comprises a set of approaches that share the common goal of automating the design and adaptation of heuristic methods to solve hard computational search problems. The main goal is to produce more generally applicable search methodologies. In this chapter we present and overview of previous categorisations of hyper-heuristics and provide a unified classification and definition which captures the work that is being undertaken in this field. We distinguish between two main hyper-heuristic categories: heuristic selection and heuristic generation. Some representative examples of each category are discussed in detail. Our goal is to both clarify the main features of existing techniques and to suggest new directions for hyper-heuristic research
A guided Monte Carlo method for optimization problems
We introduce a new Monte Carlo method by incorporating a guided distribution
function to the conventional Monte Carlo method. In this way, the efficiency of
Monte Carlo methods is drastically improved. To further speed up the algorithm,
we include two more ingredients into the algorithm. First, we freeze the
sub-patterns that have high probability of appearance during the search for
optimal solution, resulting in a reduction of the phase space of the problem.
Second, we perform the simulation at a temperature which is within the optimal
temperature range of the optimization search in our algorithm. We use this
algorithm to search for the optimal path of the traveling salesman problem and
the ground state energy of the spin glass model and demonstrate that its
performance is comparable with more elaborate and heuristic methods.Comment: 4 pages, ReVTe
Hyper-heuristic decision tree induction
A hyper-heuristic is any algorithm that searches or operates in the space of
heuristics as opposed to the space of solutions. Hyper-heuristics are
increasingly used in function and combinatorial optimization. Rather than
attempt to solve a problem using a fixed heuristic, a hyper-heuristic
approach attempts to find a combination of heuristics that solve a problem
(and in turn may be directly suitable for a class of problem instances).
Hyper-heuristics have been little explored in data mining. This work presents
novel hyper-heuristic approaches to data mining, by searching a space of
attribute selection criteria for decision tree building algorithm. The search is
conducted by a genetic algorithm. The result of the hyper-heuristic search in
this case is a strategy for selecting attributes while building decision trees.
Most hyper-heuristics work by trying to adapt the heuristic to the state of
the problem being solved. Our hyper-heuristic is no different. It employs a
strategy for adapting the heuristic used to build decision tree nodes
according to some set of features of the training set it is working on. We
introduce, explore and evaluate five different ways in which this problem
state can be represented for a hyper-heuristic that operates within a decisiontree
building algorithm. In each case, the hyper-heuristic is guided by a rule
set that tries to map features of the data set to be split by the decision tree
building algorithm to a heuristic to be used for splitting the same data set.
We also explore and evaluate three different sets of low-level heuristics that
could be employed by such a hyper-heuristic.
This work also makes a distinction between specialist hyper-heuristics and
generalist hyper-heuristics. The main difference between these two hyperheuristcs
is the number of training sets used by the hyper-heuristic genetic
algorithm. Specialist hyper-heuristics are created using a single data set from
a particular domain for evolving the hyper-heurisic rule set. Such algorithms
are expected to outperform standard algorithms on the kind of data set used
by the hyper-heuristic genetic algorithm. Generalist hyper-heuristics are
trained on multiple data sets from different domains and are expected to
deliver a robust and competitive performance over these data sets when
compared to standard algorithms.
We evaluate both approaches for each kind of hyper-heuristic presented in
this thesis. We use both real data sets as well as synthetic data sets. Our
results suggest that none of the hyper-heuristics presented in this work are
suited for specialization – in most cases, the hyper-heuristic’s performance on
the data set it was specialized for was not significantly better than that of
the best performing standard algorithm. On the other hand, the generalist
hyper-heuristics delivered results that were very competitive to the best
standard methods. In some cases we even achieved a significantly better
overall performance than all of the standard methods
- …