779 research outputs found

    Evolutionary lazy learning for Naive Bayes classification

    Full text link
    © 2016 IEEE. Most improvements for Naive Bayes (NB) have a common yet important flaw - these algorithms split the modeling of the classifier into two separate stages - the stage of preprocessing (e.g., feature selection and data expansion) and the stage of building the NB classifier. The first stage does not take the NB's objective function into consideration, so the performance of the classification cannot be guaranteed. Motivated by these facts and aiming to improve NB with accurate classification, we present a new learning algorithm called Evolutionary Local Instance Weighted Naive Bayes or ELWNB, to extend NB for classification. ELWNB combines local NB, instance weighted dataset extension and evolutionary algorithms seamlessly. Experiments on 20 UCI benchmark datasets demonstrate that ELWNB significantly outperforms NB and several other improved NB algorithms

    Evolutionary Algorithms for Hyperparameter Search in Machine Learning

    Full text link
    Machine learning algorithms usually have a number of hyperparameters. The choice of values for these hyperparameters may have a significant impact on the performance of an algorithm. In practice, for most learning algorithms the hyperparameter values are determined empirically, typically by search. From the research that has been done in this area, approaches for automating the search of hyperparameters mainly fall into the following categories: manual search, greedy search, random search, Bayesian model-based optimization, and evolutionary algorithm-based search. However, all these approaches have drawbacks — for example, manual and random search methods are undirected, greedy search is very inefficient, Bayesian model-based optimization is complicated and performs poorly with large numbers of hyperparameters, and classic evolutionary algorithm-based search can be very slow and risks falling into local optima. In this thesis we introduce three improved evolutionary algorithms applied to search for high-performing hyperparameter values for different learning algorithms. The first, named EWLNB, combines Naive Bayes and lazy instance-weighted learning. The second, EMLNB, extends this approach to multiple label classification. Finally, we further develop similar methods in an algorithm, named SEODP, for optimizing hyperparameters of deep networks, and report its usefulness on a real-world application of machine learning for philanthropy. EWLNB is a differential evolutionary algorithm which can automatically adapt to different datasets without human intervention by searching for the best hyperparameters for the models based on the characteristics of the datasets to which it is applied. To validate the EWLNB algorithm, we first use it to optimize two key parameters for a locally-weighted Naive Bayes model. Experimental evaluation of this approach on 56 of the benchmark UCI machine learning datasets demonstrate that EWLNB significantly outperforms Naive Bayes as well as several other improved versions of the Naive Bayes algorithms both in terms of classification accuracy and class probability estimation. We then extend the EWLNB approach in the form of the Evolutionary Multi-label Lazy Naive Bayes (EMLNB) algorithm to enable hyperparameter search for multi-label classification problems. Lastly, we revise the above algorithms to propose a method, SEODP, for optimizing deep learning (DL) architecture and hyperparameters. SEODP uses a semi-evolutionary and semi-random approach to search for hyperparameter values, which is designed to evolve a solution automatically over different datasets. SEODP is much faster than other methods, and can adaptively determine different deep network architectures automatically. Experimental results show that compared with manual search, SEODP is much more effective, and compared with grid search, SEODP can achieve optimal performance using only approximately 2% of the running time of greedy search. We also use SEODP on a real-world social-behavioral dataset from a charity organization for a philanthropy application. This dataset contains comprehensive real-time attributes on potential indicators for candidates to be donors. The results show that SEODP is a promising approach for optimizing deep network (DN) architectures over different types of datasets, including a real-world dataset. In summary, the results in this thesis indicate that our methods address the main drawback of evolutionary algorithms, which is the convergence time, and show experimentally that evolutionary-based algorithms can achieve good results in optimizing the hyperparameters for a range of different machine learning algorithms

    Local feature weighting in nearest prototype classification

    Get PDF
    The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.Publicad

    Data Mining Using Relational Database Management Systems

    Get PDF
    Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

    An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis

    Get PDF
    open access articleThis article presents a novel hybrid classification paradigm for medical diagnoses and prognoses prediction. The core mechanism of the proposed method relies on a centroid classification algorithm whose logic is exploited to formulate the classification task as a real-valued optimisation problem. A novel metaheuristic combining the algorithmic structure of Swarm Intelligence optimisers with the probabilistic search models of Estimation of Distribution Algorithms is designed to optimise such a problem, thus leading to high-accuracy predictions. This method is tested over 11 medical datasets and compared against 14 cherry-picked classification algorithms. Results show that the proposed approach is competitive and superior to the state-of-the-art on several occasions

    Classification accuracy performance of Naïve Bayesian (NB), Bayesian Networks (BN), Lazy Learning of Bayesian Rules(LBR) and Instance-Based Learner (IB1) - comparative study

    Get PDF
    In recent years the used of personalization in service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. A number of classification algorithms have been used to classify user related information to create accurate user profiles. In this study four different classification algorithms which are; naive Bayesian (NB), Bayesian networks (BN), lazy learning of Bayesian rules (LBR) and instance-based learner (IB1) are compared using a set of user profile data. According to our simulation results NB and IB1 classifiers have the highest classification accuracy with the lowest error rate. The obtained simulation results have been evaluated against the existing works of support vector machines (SVMs), decision trees (DTs) and neural networks (NNs)
    corecore