thesis

Static and dynamic overproduction and selection of classifier ensembles with genetic algorithms

Abstract

The overproduce-and-choose sttategy is a static classifier ensemble selection approach, which is divided into overproduction and selection phases. This thesis focuses on the selection phase, which is the challenge in overproduce-and-choose strategy. When this phase is implemented as an optimization process, the search criterion and the search algorithm are the two major topics involved. In this thesis, we concentrate in optimization processes conducted using genetic algorithms guided by both single- and multi-objective functions. We first focus on finding the best search criterion. Various search criteria are investigated, such as diversity, the error rate and ensemble size. Error rate and diversity measures are directly compared in the single-objective optimization approach. Diversity measures are combined with the error rate and with ensemble size, in pairs of objective functions, to guide the multi-optimization approach. Experimental results are presented and discussed. Thereafter, we show that besides focusing on the characteristics of the decision profiles of ensemble members, the control of overfitting at the selection phase of overproduce-and-choose strategy must also be taken into account. We show how overfitting can be detected at the selection phase and present three strategies to control overfitting. These strategies are tailored for the classifier ensemble selection problcm and compared. This comparison allows us to show that a global validation strategy should be applied to control overfitting in optimization processes involving a classifier ensembles selection task. Furthermore, this study has helped us establish that this global validation strategy can be used as a tool to measure the relationship between diversity and classification performance when diversity measures are employed as single-objective functions. Finally, the main contribution of this thesis is a proposed dynamic overproduce-and-choose strategy. While the static overproduce-and-choose selection strategy has traditionally focused on finding the most accurate subset of classifiers during the selection phase, and using it to predict the class of all the test samples, our dynamic overproduce-and- choose strategy allows the selection of the most confident subset of classifiers to label each test sample individually. Our method combines optimization and dynamic selection in a two-level selection phase. The optimization level is intended to generate a population of highly accurate classifier ensembles, while the dynamic selection level applies measures of confidence in order to select the ensemble with the highest degree of confidence in the current decision. Three different confidence measures are presented and compared. Our method outperforms classical static and dynamic selection strategies

    Similar works