8 research outputs found

    Feature Selection using Tabu Search with Learning Memory: Learning Tabu Search

    Get PDF
    International audienceFeature selection in classification can be modeled as a com-binatorial optimization problem. One of the main particularities of this problem is the large amount of time that may be needed to evaluate the quality of a subset of features. In this paper, we propose to solve this problem with a tabu search algorithm integrating a learning mechanism. To do so, we adapt to the feature selection problem, a learning tabu search algorithm originally designed for a railway network problem in which the evaluation of a solution is time-consuming. Experiments are conducted and show the benefit of using a learning mechanism to solve hard instances of the literature

    Fusing diverse monitoring algorithms for robust change detection

    Full text link

    Weight-Selected Attribute Bagging for Credit Scoring

    Get PDF
    Assessment of credit risk is of great importance in financial risk management. In this paper, we propose an improved attribute bagging method, weight-selected attribute bagging (WSAB), to evaluate credit risk. Weights of attributes are first computed using attribute evaluation method such as linear support vector machine (LSVM) and principal component analysis (PCA). Subsets of attributes are then constructed according to weights of attributes. For each of attribute subsets, the larger the weights of the attributes the larger the probabilities by which they are selected into the attribute subset. Next, training samples and test samples are projected onto each attribute subset, respectively. A scoring model is then constructed based on each set of newly produced training samples. Finally, all scoring models are used to vote for test instances. An individual model that only uses selected attributes will be more accurate because of elimination of some of redundant and uninformative attributes. Besides, the way of selecting attributes by probability can also guarantee the diversity of scoring models. Experimental results based on two credit benchmark databases show that the proposed method, WSAB, is outstanding in both prediction accuracy and stability, as compared to analogous methods

    Designing multiple classifier combinations a survey

    Get PDF
    Classification accuracy can be improved through multiple classifier approach. It has been proven that multiple classifier combinations can successfully obtain better classification accuracy than using a single classifier. There are two main problems in designing a multiple classifier combination which are determining the classifier ensemble and combiner construction. This paper reviews approaches in constructing the classifier ensemble and combiner. For each approach, methods have been reviewed and their advantages and disadvantages have been highlighted. A random strategy and majority voting are the most commonly used to construct the ensemble and combiner, respectively. The results presented in this review are expected to be a road map in designing multiple classifier combinations

    Design and Investigation of a Multi Agent Based XCS Learning Classifier System with Distributed Rules

    Get PDF
    This thesis has introduced and investigated a new kind of rule-based evolutionary online learning system. It addressed the problem of distributing the knowledge of a Learning Classifier System, that is represented by a population of classifiers. The result is a XCS-derived Learning Classifier System 'XCS with Distributed Rules' (XCS-DR) that introduces independent, interacting agents to distribute the system's acquired knowledge evenly. The agents act collaboratively to solve problem instances at hand. XCS-DR's design and architecture have been explained and its classification performance has been evaluated and scrutinized in detail in this thesis. While not reaching optimal performance, compared to the original XCS, it could be shown that XCS-DR still yields satisfactory classification results. It could be shown that in the simple case of applying only one agent, the introduced system performs as accurately as XCS

    Ensemble-based Supervised Learning for Predicting Diabetes Onset

    Get PDF
    The research presented in this thesis aims to address the issue of undiagnosed diabetes cases. The current state of knowledge is that one in seventy people in the United Kingdom are living with undiagnosed diabetes, and only one in a hundred people could identify the main signs of diabetes. Some of the tools available for predicting diabetes are either too simplistic and/or rely on superficial data for inference. On the positive side, the National Health Service (NHS) are improving data recording in this domain by offering health check to adults aged 40 - 70. Data from such programme could be utilised to mitigate the issue of superficial data; but also help to develop a predictive tool that facilitates a change from the current reactive care, onto one that is proactive. This thesis presents a tool based on a machine learning ensemble for predicting diabetes onset. Ensembles often perform better than a single classifier, and accuracy and diversity have been highlighted as the two vital requirements for constructing good ensemble classifiers. Experiments in this thesis explore the relationship between diversity from heterogeneous ensemble classifiers and the accuracy of predictions through feature subset selection in order to predict diabetes onset. Data from a national health check programme (similar to NHS health check) was used. The aim is to predict diabetes onset better than other similar studies within the literature. For the experiments, predictions from five base classifiers (Sequential Minimal Optimisation (SMO), Radial Basis Function (RBF), Naïve Bayes (NB), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and C4.5 decision tree), performing the same task, are exploited in all possible combinations to construct 26 ensemble models. The training data feature space was searched to select the best feature subset for each classifier. Selected subsets are used to train the classifiers and their predictions are combined using k-Nearest Neighbours algorithm as meta-classifier. Results are analysed using four performance metrics (accuracy, sensitivity, specificity and AUC) to determine (i) if ensembles always perform better than single classifier; and (ii) the impact of diversity (from heterogeneous classifiers) and accuracy (through feature subset selection) on ensemble performance. At base classification level, RBF produced better results than the other four classifiers with 78%accuracy, 82% sensitivity, 73% specificity and 85% AUC. A comparative study shows that RBF model is more accurate than 9 ensembles, more sensitive than 13 ensembles, more specific than 9 ensembles; and produced better AUC than 25 ensembles. This means that ensembles do not always perform better than its constituent classifiers. Of those ensembles that performed better than RBF, the combination of C4.5, RIPPER and NB produced the highest results with 83% accuracy, 87% sensitivity, 79% specificity, and 86% AUC. When compared to the RBF model, the result shows 5.37% accuracy improvement which is significant (p = 0.0332). The experiments show how data from medical health examination can be utilised to address the issue of undiagnosed cases of diabetes. Models constructed with such data would facilitate the much desired shift from preventive to proactive care for individuals at high risk of diabetes. From the machine learning view point, it was established that ensembles constructed based on diverse and accurate base learners, have the potential to produce significant improvement in accuracy, compared to its individual constituent classifiers. In addition, the ensemble presented in this thesis is at least 1% and at most 23% more accurate than similar research studies found within the literature. This validates the superiority of the method implemented

    Combinación de clasificadores: construcción de características e incremento de la diversidad

    Get PDF
    Los multiclasificadores son actualmente un área de interés dentro del Reconocimiento de Patrones. En esta tesis se presentan tres métodos multiclasificadores: "Cascadas para datos nominales", "Disturbing Neighbors" y "Random Feature Weights". Las Cascadas permiten que clasificadores que necesitan entradas numéricas mejoren sus resultados, tomando como entradas adicionales las estimaciones de probabilidad de otro clasificador que sí pueda trabajar con datos nominales. "Disturbing Neighbors" aumenta el conjunto de entrenamiento de cada clasificador base a partir de la salida de un clasificador NN. El NN de cada clasificador base es obtenido de forma aleatoria. Random Feature Weights es un método que utiliza árboles cómo clasificadores base, que modifica la función de mérito de los mismos mediante un peso aleatorio. Además la tesis aporta nuevos diagramas para la visualización de la diversidad de los clasificadores base: Diagramas de Movimiento Kappa-Error y los Diagramas de Movimiento Relativo Kappa-Erro
    corecore