8 research outputs found

    A new model for large dataset dimensionality reduction based on teaching learning-based optimization and logistic regression

    Get PDF
    One of the human diseases with a high rate of mortality each year is breast cancer (BC). Among all the forms of cancer, BC is the commonest cause of death among women globally. Some of the effective ways of data classification are data mining and classification methods. These methods are particularly efficient in the medical field due to the presence of irrelevant and redundant attributes in medical datasets. Such redundant attributes are not needed to obtain an accurate estimation of disease diagnosis. Teaching learning-based optimization (TLBO) is a new metaheuristic that has been successfully applied to several intractable optimization problems in recent years. This paper presents the use of a multi-objective TLBO algorithm for the selection of feature subsets in automatic BC diagnosis. For the classification task in this work, the logistic regression (LR) method was deployed. From the results, the projected method produced better BC dataset classification accuracy (classified into malignant and benign). This result showed that the projected TLBO is an efficient features optimization technique for sustaining data-based decision-making systems

    Adversarial Samples on Android Malware Detection Systems for IoT Systems

    Full text link
    Many IoT(Internet of Things) systems run Android systems or Android-like systems. With the continuous development of machine learning algorithms, the learning-based Android malware detection system for IoT devices has gradually increased. However, these learning-based detection models are often vulnerable to adversarial samples. An automated testing framework is needed to help these learning-based malware detection systems for IoT devices perform security analysis. The current methods of generating adversarial samples mostly require training parameters of models and most of the methods are aimed at image data. To solve this problem, we propose a \textbf{t}esting framework for \textbf{l}earning-based \textbf{A}ndroid \textbf{m}alware \textbf{d}etection systems(TLAMD) for IoT Devices. The key challenge is how to construct a suitable fitness function to generate an effective adversarial sample without affecting the features of the application. By introducing genetic algorithms and some technical improvements, our test framework can generate adversarial samples for the IoT Android Application with a success rate of nearly 100\% and can perform black-box testing on the system

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Information gain directed genetic algorithm wrapper feature selection for credit rating

    Get PDF
    Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making

    A Parallel Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Machine

    No full text
    The extensive applications of support vector machines (SVMs) require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA) based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time

    A Parallel Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Machine

    No full text
    The extensive applications of support vector machines (SVMs) require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA) based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time

    Optimasi Parameter Support Vector Machine menggunakan Genetic Algorithm untuk Klasifikasi Microarray Data

    Get PDF
    Support Vector Machine (SVM) merupakan metode machine learning untukmengklasifikasikan data yang telah berhasil digunakan utuk menyelesaikan permasalahan dalam berbagai bidang. Prinsip risk minimization yang digunakan dapat menghasilkan model SVM dengan kemampuan generalisasi yang baik. Permasalahan yang terdapat dalam metode SVM adalah kesulitan dalam menentukan hyperparameter SVM yang optimal, padahal pengaturan nilai parameter secara tepat akan meningkatkan akurasi klasifikasi SVM. Penelitian ini menggunakan Genetic Algorithm (GA) untuk mengoptimasi hyperparameter SVM. Optimasi GA pada SVM dibandingkan dengan optimasi Grid Search untuk membentuk model SVM yang digunakan untuk mengklasifikasikan data pada data microarray, yatu Data Colon Cancer dan Data Leukemia. Dari hasilanalisis, metode GA-SVM dapat menghasilkan performa klasifikasi yang lebih baik dibandingkan metode Grid Search SVM untuk data Colon. Pada data Leukemia, metode GA-SVM menghasilkan performa klasifikasi yang sama dengan metode Grid Search SVM, yaitu 100% untuk masing masing ukuran performa klasifikasi. ========================================================================= Support Vector Machine (SVM) is a machine learning method to classify data that has been successfully used to solve problems in various fields. The principle of risk minimization that can be used to produce SVM model have good generalization capability. The problem in the SVM method is the difficulty in determining the optimal SVM hyperparameter, whereas setting the parameter values appropriately will improve the accuracy of SVM classification. This study uses Genetic Algorithm (GA) to optimize SVM hyperparameter. GA optimization in SVM compared with Grid Search optimization to form the SVM model use to classify data on microarray data, Colon Cancer dataset and Leukemia dataset. From the analysis result, GA-SVM method can yield better classification performance than Grid Search SVM for Colon data. In the Leukemia data, GA-SVM method resulted in the same classification performance with the Grid Search SVM method, which is 100% for each classification performance measure
    corecore