2,731 research outputs found

    One-Class-at-a-Time Removal Sequence Planning Method for Multiclass Classification Problems

    Get PDF
    Using dynamic programming, this work develops a one-class-at-a-time removal sequence planning method to decompose a multiclass classification problem into a series of two-class problems. Compared with previous decomposition methods, the approach has the following distinct features. First, under the one-class-at-a-time framework, the approach guarantees the optimality of the decomposition. Second, for a K-class problem, the number of binary classifiers required by the method is only K-1. Third, to achieve higher classification accuracy, the approach can easily be adapted to form a committee machine. A drawback of the approach is that its computational burden increases rapidly with the number of classes. To resolve this difficulty, a partial decomposition technique is introduced that reduces the computational cost by generating a suboptimal solution. Experimental results demonstrate that the proposed approach consistently outperforms two conventional decomposition methods

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    DATA MINING AND IMAGE CLASSIFICATION USING GENETIC PROGRAMMING

    Get PDF
    Genetic programming (GP), a capable machine learning and search method, motivated by Darwinian-evolution, is an evolutionary learning algorithm which automatically evolves computer programs in the form of trees to solve problems. This thesis studies the application of GP for data mining and image processing. Knowledge discovery and data mining have been widely used in business, healthcare, and scientific fields. In data mining, classification is supervised learning that identifies new patterns and maps the data to predefined targets. A GP based classifier is developed in order to perform these mappings. GP has been investigated in a series of studies to classify data; however, there are certain aspects which have not formerly been studied. We propose an optimized GP classifier based on a combination of pruning subtrees and a new fitness function. An orthogonal least squares algorithm is also applied in the training phase to create a robust GP classifier. The proposed GP classifier is validated by 10-fold cross validation. Three areas were studied in this thesis. The first investigation resulted in an optimized genetic-programming-based classifier that directly solves multi-class classification problems. Instead of defining static thresholds as boundaries to differentiate between multiple labels, our work presents a method of classification where a GP system learns the relationships among experiential data and models them mathematically during the evolutionary process. Our approach has been assessed on six multiclass datasets. The second investigation was to develop a GP classifier to segment and detect brain tumors on magnetic resonance imaging (MRI) images. The findings indicated the high accuracy of brain tumor classification provided by our GP classifier. The results confirm the strong ability of the developed technique for complicated image classification problems. The third was to develop a hybrid system for multiclass imbalanced data classification using GP and SMOTE which was tested on satellite images. The finding showed that the proposed approach improves both training and test results when the SMOTE technique is incorporated. We compared our approach in terms of speed with previous GP algorithms as well. The analyzed results illustrate that the developed classifier produces a productive and rapid method for classification tasks that outperforms the previous methods for more challenging multiclass classification problems. We tested the approaches presented in this thesis on publicly available datasets, and images. The findings were statistically tested to conclude the robustness of the developed approaches

    Automated design of genetic programming of classification algorithms.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Over the past decades, there has been an increase in the use of evolutionary algorithms (EAs) for data mining and knowledge discovery in a wide range of application domains. Data classification, a real-world application problem is one of the areas EAs have been widely applied. Data classification has been extensively researched resulting in the development of a number of EA based classification algorithms. Genetic programming (GP) in particular has been shown to be one of the most effective EAs at inducing classifiers. It is widely accepted that the effectiveness of a parameterised algorithm like GP depends on its configuration. Currently, the design of GP classification algorithms is predominantly performed manually. Manual design follows an iterative trial and error approach which has been shown to be a menial, non-trivial time-consuming task that has a number of vulnerabilities. The research presented in this thesis is part of a large-scale initiative by the machine learning community to automate the design of machine learning techniques. The study investigates the hypothesis that automating the design of GP classification algorithms for data classification can still lead to the induction of effective classifiers. This research proposes using two evolutionary algorithms,namely,ageneticalgorithm(GA)andgrammaticalevolution(GE)toautomatethe design of GP classification algorithms. The proof-by-demonstration research methodology is used in the study to achieve the set out objectives. To that end two systems namely, a genetic algorithm system and a grammatical evolution system were implemented for automating the design of GP classification algorithms. The classification performance of the automated designed GP classifiers, i.e., GA designed GP classifiers and GE designed GP classifiers were compared to manually designed GP classifiers on real-world binary class and multiclass classification problems. The evaluation was performed on multiple domain problems obtained from the UCI machine learning repository and on two specific domains, cybersecurity and financial forecasting. The automated designed classifiers were found to outperform the manually designed GP classifiers on all the problems considered in this study. GP classifiers evolved by GE were found to be suitable for classifying binary classification problems while those evolved by a GA were found to be suitable for multiclass classification problems. Furthermore, the automated design time was found to be less than manual design time. Fitness landscape analysis of the design spaces searched by a GA and GE were carried out on all the class of problems considered in this study. Grammatical evolution found the search to be smoother on binary classification problems while the GA found multiclass problems to be less rugged than binary class problems

    Studying elements ofgenetic programming for multiclass classification

    Get PDF
    Tese de mestrado, Engenharia Informática (Interação e Conhecimento) Universidade de Lisboa, Faculdade de Ciências, 2018Although Genetic Programming (GP) has been very successful in both symbolic regression and binary classification by solving many difficult problems from various domains, it requires improvements in multiclass classification, which due to the high complexity of this kind of problems, requires specialized classifiers. In this project, we explored a multiclass classification GP-based algorithm, the M3GP [4]. The individuals in standard GP only have one node at their root. This means that their output space is in R. Unlike standard GP, M3GP allows each individual to have n nodes at its root. This variation changes the output space to Rn, allowing them to construct clusters of samples and use a cluster-based classification. Although M3GP is capable of creating interpretable models while having competitive results with state-of-the-art classifiers, such as Random Forests and Neural Networks, it has downsides. The focus of this project is to improve the algorithm by exploring two components, the fitness function, and the genetic operators’ selection method. The original fitness function was accuracy-based. Since using this kind of functions does not allow a smooth evolution of the output space, we tried to improve the algorithm by exploring two distance-based fitness functions as an attempt to separate the clusters while bringing the samples closer to their respective centroids. Until now, the genetic operators in M3GP were selected with a fixed probability. Since some operators have a better effect on the fitness at different stages of the evolution, the fixed probabilities allow operators to be selected at the wrong stages of the evolution, slowing down the learning process. In this project, we try to evolve the probability the genetic operators have of being chosen over the generations. On a later stage, we proposed a new crossover genetic operator that uses three individuals for the M3GP algorithm. The results obtained show significantly better results in the training set in half the datasets, while improving the test accuracy in two datasets

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results
    corecore