1,751 research outputs found

    Seir immune strategy for instance weighted naive bayes classification

    Full text link
    © Springer International Publishing Switzerland 2015. Naive Bayes (NB) has been popularly applied in many classification tasks. However, in real-world applications, the pronounced advantage of NB is often challenged by insufficient training samples. Specifically, the high variance may occur with respect to the limited number of training samples. The estimated class distribution of a NB classier is inaccurate if the number of training instances is small. To handle this issue, in this paper, we proposed a SEIR (Susceptible, Exposed, Infectious and Recovered) immune-strategy-based instance weighting algorithm for naive Bayes classification, namely SWNB. The immune instance weighting allows the SWNB algorithm adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. Experiments and comparisons on 20 benchmark datasets demonstrated that the proposed SWNB algorithm outperformed existing state-of-the-art instance weighted NB algorithm and other related computational intelligence methods

    Evolutionary lazy learning for Naive Bayes classification

    Full text link
    © 2016 IEEE. Most improvements for Naive Bayes (NB) have a common yet important flaw - these algorithms split the modeling of the classifier into two separate stages - the stage of preprocessing (e.g., feature selection and data expansion) and the stage of building the NB classifier. The first stage does not take the NB's objective function into consideration, so the performance of the classification cannot be guaranteed. Motivated by these facts and aiming to improve NB with accurate classification, we present a new learning algorithm called Evolutionary Local Instance Weighted Naive Bayes or ELWNB, to extend NB for classification. ELWNB combines local NB, instance weighted dataset extension and evolutionary algorithms seamlessly. Experiments on 20 UCI benchmark datasets demonstrate that ELWNB significantly outperforms NB and several other improved NB algorithms

    Self-adaptive attribute weighting for Naive Bayes classification

    Full text link
    ©2014 Elsevier Ltd. All rights reserved. Naive Bayes (NB) is a popular machine learning tool for classification, due to its simplicity, high computational efficiency, and good classification accuracy, especially for high dimensional data such as texts. In reality, the pronounced advantage of NB is often challenged by the strong conditional independence assumption between attributes, which may deteriorate the classification performance. Accordingly, numerous efforts have been made to improve NB, by using approaches such as structure extension, attribute selection, attribute weighting, instance weighting, local learning and so on. In this paper, we propose a new Artificial Immune System (AIS) based self-adaptive attribute weighting method for Naive Bayes classification. The proposed method, namely AISWNB, uses immunity theory in Artificial Immune Systems to search optimal attribute weight values, where self-adjusted weight values will alleviate the conditional independence assumption and help calculate the conditional probability in an accurate way. One noticeable advantage of AISWNB is that the unique immune system based evolutionary computation process, including initialization, clone, section, and mutation, ensures that AISWNB can adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. As a result, AISWNB can obtain good attribute weight values during the learning process. Experiments and comparisons on 36 machine learning benchmark data sets and six image classification data sets demonstrate that AISWNB significantly outperforms its peers in classification accuracy, class probability estimation, and class ranking performance

    SODE: Self-Adaptive One-Dependence Estimators for classification

    Full text link
    © 2015 Elsevier Ltd. SuperParent-One-Dependence Estimators (SPODEs) represent a family of semi-naive Bayesian classifiers which relax the attribute independence assumption of Naive Bayes (NB) to allow each attribute to depend on a common single attribute (superparent). SPODEs can effectively handle data with attribute dependency but still inherent NB's key advantages such as computational efficiency and robustness for high dimensional data. In reality, determining an optimal superparent for SPODEs is difficult. One common approach is to use weighted combinations of multiple SPODEs, each having a different superparent with a properly assigned weight value (i.e., a weight value is assigned to each attribute). In this paper, we propose a self-adaptive SPODEs, namely SODE, which uses immunity theory in artificial immune systems to automatically and self-adaptively select the weight for each single SPODE. SODE does not need to know the importance of individual SPODE nor the relevance among SPODEs, and can flexibly and efficiently search optimal weight values for each SPODE during the learning process. Extensive experiments and comparisons on 56 benchmark data sets, and validations on image and text classification, demonstrate that SODE outperforms state-of-the-art weighted SPODE algorithms and is suitable for a wide range of learning tasks. Results also confirm that SODE provides an appropriate balance between runtime efficiency and accuracy

    OPTIMALISASI METODE NAIVE BAYES DAN DECISION TREE UNTUK MENENTUKAN PROGRAM STUDI BAGI CALON MAHASISWA BARU DENGAN PENDEKATAN UNSUPERVISED DISCRETIZATION

    Get PDF
    Higher Education is a place for providing education that aims to produce quality human resources and is able to face increasingly fierce job competition. Therefore, from the recruitment process or the admission process, prospective new students must consider various procedures that aim to be able to direct prospective new students in determining the study program that will be taken by prospective new students. The things that have been broken in the admission process for new students include the scores of national exam results, report cards, school test scores and the admission test for new students, as well as the admission process of the achievement path and aiming for missions. From these things, performance must be improved is a supporting factor so that the process of transforming educational science to students can be carried out properly. The purpose of this study is to obtain classification in determining the study program of prospective new students by optimizing the Naïve Bayes and Decision Tree methods with an Unsupervised Discretization Approach, as an effort to improve the internal quality assurance system, especially the standards for the admission process for new students in determining study programs at the Harapan Polytechnic with Tegal. Where in the process of accepting new students, planning, implementing, evaluating, and monitoring have been carried out as a form of implementing the Internal Quality Assurance System (SPMI). In this study, the data used was data on the results of the admission of prospective new students from all study programs. These data include data on the administrative completeness of the requirements of prospective new students, as well as data on the value of the results of the new student admission test. The data used is data for 1 academic year 2019/2020. From this data, training and testing will be carried out using Rapidminer 9, a classification of lecturer teaching performance will be obtained.Perguruan Tinggi merupakan tempat penyelenggara pendidikan yang bertujuan menghasilkan sumber daya manusia yang berkualitas dan mampu menghadapi persaingan kerja yang semakin ketat. Maka dari dalam proses rekrutmen atau proses penerimaan calon mahasiswa baru harus mempertimbangkan berbagai prosedur yang bertujuan untuk dapat mengarahkan calon mahasiswa baru dalam menentukan program studi yang akan ditempuh oleh calon mahasiswa baru. Adapun hal sudah ditembuh dalam proses penerimaan mahasiswa baru antara lain dari nilai hasil ujian nasional, nilai raport, nilai ujian sekolah dan nilah test penerimaan mahasiswa baru, serta proses penerimaan dari jalur prestasi dan bidik misi. Dari hal – hal tersebut harus ditingkatkan kinerjanya merupakan faktor penunjang agar proses transformasi keilmuan pendidikan kepada mahasiswa dapat dilakukan dengan baik. Tujuan dari penelitian ini adalah untuk mendapatkan klasifikasi dalam menentukan program studi calon mahasiswa baru dengan mengoptimalkan metode Naïve Bayes dan Decision Tree dengan Pendekatan Unsupervised Discretization, sebagai upaya dalam peningkatan sistem penjaminan mutu internal khususnya standar proses penerimaan mahasiswa baru dalam menentukan program studi pada Politeknik Harapan Bersama Tegal. Dimana dalam proses penerimaan mahasiswa baru ini telah dilakukan perencanaan, pelaksanaan, evaluasi, dan monitoring sebagai bentuk penerapan Sistem Penjamin Mutu Internal (SPMI). Dalam penelitian ini, data yang digunakan adalah data hasil penerimaan calon mahasiswa baru dari seluruh program studi. Data tersebut antara lain data kelengkapan administratif persyaratan calon mahasiswa baru, serta data nilai hasil test penerimaan mahasiswa baru. Data yang digunakan yakni data selama 1 tahun akademik 2019/2020. Dari data tersebut akan dilakukan training dan testing dengan menggunakan Rapidminer 9, maka akan didapatkan klasifikasi kinerja pengajaran dosen

    Comparison of SVM and some older classification algorithms in text classification tasks

    Get PDF
    Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI

    Evolutionary Algorithms for Hyperparameter Search in Machine Learning

    Full text link
    Machine learning algorithms usually have a number of hyperparameters. The choice of values for these hyperparameters may have a significant impact on the performance of an algorithm. In practice, for most learning algorithms the hyperparameter values are determined empirically, typically by search. From the research that has been done in this area, approaches for automating the search of hyperparameters mainly fall into the following categories: manual search, greedy search, random search, Bayesian model-based optimization, and evolutionary algorithm-based search. However, all these approaches have drawbacks — for example, manual and random search methods are undirected, greedy search is very inefficient, Bayesian model-based optimization is complicated and performs poorly with large numbers of hyperparameters, and classic evolutionary algorithm-based search can be very slow and risks falling into local optima. In this thesis we introduce three improved evolutionary algorithms applied to search for high-performing hyperparameter values for different learning algorithms. The first, named EWLNB, combines Naive Bayes and lazy instance-weighted learning. The second, EMLNB, extends this approach to multiple label classification. Finally, we further develop similar methods in an algorithm, named SEODP, for optimizing hyperparameters of deep networks, and report its usefulness on a real-world application of machine learning for philanthropy. EWLNB is a differential evolutionary algorithm which can automatically adapt to different datasets without human intervention by searching for the best hyperparameters for the models based on the characteristics of the datasets to which it is applied. To validate the EWLNB algorithm, we first use it to optimize two key parameters for a locally-weighted Naive Bayes model. Experimental evaluation of this approach on 56 of the benchmark UCI machine learning datasets demonstrate that EWLNB significantly outperforms Naive Bayes as well as several other improved versions of the Naive Bayes algorithms both in terms of classification accuracy and class probability estimation. We then extend the EWLNB approach in the form of the Evolutionary Multi-label Lazy Naive Bayes (EMLNB) algorithm to enable hyperparameter search for multi-label classification problems. Lastly, we revise the above algorithms to propose a method, SEODP, for optimizing deep learning (DL) architecture and hyperparameters. SEODP uses a semi-evolutionary and semi-random approach to search for hyperparameter values, which is designed to evolve a solution automatically over different datasets. SEODP is much faster than other methods, and can adaptively determine different deep network architectures automatically. Experimental results show that compared with manual search, SEODP is much more effective, and compared with grid search, SEODP can achieve optimal performance using only approximately 2% of the running time of greedy search. We also use SEODP on a real-world social-behavioral dataset from a charity organization for a philanthropy application. This dataset contains comprehensive real-time attributes on potential indicators for candidates to be donors. The results show that SEODP is a promising approach for optimizing deep network (DN) architectures over different types of datasets, including a real-world dataset. In summary, the results in this thesis indicate that our methods address the main drawback of evolutionary algorithms, which is the convergence time, and show experimentally that evolutionary-based algorithms can achieve good results in optimizing the hyperparameters for a range of different machine learning algorithms
    • …
    corecore