693 research outputs found

    An Evolutionary Optimization Algorithm for Automated Classical Machine Learning

    Get PDF
    Machine learning is an evolving branch of computational algorithms that allow computers to learn from experiences, make predictions, and solve different problems without being explicitly programmed. However, building a useful machine learning model is a challenging process, requiring human expertise to perform various proper tasks and ensure that the machine learning\u27s primary objective --determining the best and most predictive model-- is achieved. These tasks include pre-processing, feature selection, and model selection. Many machine learning models developed by experts are designed manually and by trial and error. In other words, even experts need the time and resources to create good predictive machine learning models. The idea of automated machine learning (AutoML) is to automate a machine learning pipeline to release the burden of substantial development costs and manual processes. The algorithms leveraged in these systems have different hyper-parameters. On the other hand, different input datasets have various features. In both cases, the final performance of the model is closely related to the final selected configuration of features and hyper-parameters. That is why they are considered as crucial tasks in the AutoML. The challenges regarding the computationally expensive nature of tuning hyper-parameters and optimally selecting features create significant opportunities for filling the research gaps in the AutoML field. This dissertation explores how to select the features and tune the hyper-parameters of conventional machine learning algorithms efficiently and automatically. To address the challenges in the AutoML area, novel algorithms for hyper-parameter tuning and feature selection are proposed. The hyper-parameter tuning algorithm aims to provide the optimal set of hyper-parameters in three conventional machine learning models (Random Forest, XGBoost and Support Vector Machine) to obtain best scores regarding performance. On the other hand, the feature selection algorithm looks for the optimal subset of features to achieve the highest performance. Afterward, a hybrid framework is designed for both hyper-parameter tuning and feature selection. The proposed framework can discover close to the optimal configuration of features and hyper-parameters. The proposed framework includes the following components: (1) an automatic feature selection component based on artificial bee colony algorithms and machine learning training, and (2) an automatic hyper-parameter tuning component based on artificial bee colony algorithms and machine learning training for faster training and convergence of the learning models. The whole framework has been evaluated using four real-world datasets in different applications. This framework is an attempt to alleviate the challenges of hyper-parameter tuning and feature selection by using efficient algorithms. However, distributed processing, distributed learning, parallel computing, and other big data solutions are not taken into consideration in this framework

    Metaheuristic design of feedforward neural networks: a review of two decades of research

    Get PDF
    Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era

    Integrated bio-search approaches with multi-objective algorithms for optimization and classification problem

    Get PDF
    Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features

    A framework for feature selection through boosting

    Get PDF
    As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy

    Applications of Nature-Inspired Algorithms for Dimension Reduction: Enabling Efficient Data Analytics

    Get PDF
    In [1], we have explored the theoretical aspects of feature selection and evolutionary algorithms. In this chapter, we focus on optimization algorithms for enhancing data analytic process, i.e., we propose to explore applications of nature-inspired algorithms in data science. Feature selection optimization is a hybrid approach leveraging feature selection techniques and evolutionary algorithms process to optimize the selected features. Prior works solve this problem iteratively to converge to an optimal feature subset. Feature selection optimization is a non-specific domain approach. Data scientists mainly attempt to find an advanced way to analyze data n with high computational efficiency and low time complexity, leading to efficient data analytics. Thus, by increasing generated/measured/sensed data from various sources, analysis, manipulation and illustration of data grow exponentially. Due to the large scale data sets, Curse of dimensionality (CoD) is one of the NP-hard problems in data science. Hence, several efforts have been focused on leveraging evolutionary algorithms (EAs) to address the complex issues in large scale data analytics problems. Dimension reduction, together with EAs, lends itself to solve CoD and solve complex problems, in terms of time complexity, efficiently. In this chapter, we first provide a brief overview of previous studies that focused on solving CoD using feature extraction optimization process. We then discuss practical examples of research studies are successfully tackled some application domains, such as image processing, sentiment analysis, network traffics / anomalies analysis, credit score analysis and other benchmark functions/data sets analysis

    Oversampling technique in student performance classification from engineering course

    Get PDF
    The first year of an engineering student was important to take proper academic planning. All subjects in the first year were essential for an engineering basis. Student performance prediction helped academics improve their performance better. Students checked performance by themselves. If they were aware that their performance are low, then they could make some improvement for their better performance. This research focused on combining the oversampling minority class data with various kinds of classifier models. Oversampling techniques were SMOTE, Borderline-SMOTE, SVMSMOTE, and ADASYN and four classifiers were applied using MLP, gradient boosting, AdaBoost and random forest in this research. The results represented that Borderline-SMOTE gave the best result for minority class prediction with several classifiers

    Predicting Arrhythmia Based on Machine Learning Using Improved Harris Hawk Algorithm

    Get PDF
    Arrhythmia disease is widely recognized as a prominent and lethal ailment on a global scale, resulting in a significant number of fatalities annually. The timely identification of this ailment is crucial for preserving individuals' lives. Machine Learning (ML), a branch of artificial intelligence (AI), has emerged as a highly efficient and cost-effective method for illness detection. The objective of this work is to develop a machine learning (ML) model capable of accurately predicting heart illness by using the Arrhythmia disease dataset, with the purpose of achieving optimal performance. The performance of the model is greatly influenced by the selection of the machine learning method and the features in the dataset for training purposes. In order to mitigate the issue of overfitting caused by the high dimensionality of the features in the Arrhythmia dataset, a reduction of the dataset to a lower dimensional subspace was performed via the improved Harris hawk optimization algorithm (iHHO). The Harris hawk algorithm exhibits a rapid convergence rate and possesses a notable degree of adaptability in its ability to identify optimal characteristics. The performance of the models created with the feature-selected dataset using various machine learning techniques was evaluated and compared. In this work, total seven classifiers like SVM, GB, GNB, RF, LR, DT, and KNN are used to classify the data produced by the iHHO algorithm. The results clearly show the improvement of 3%, 4%, 4%, 9%, 8%, 3%, and 9% with the classifiers KNN, RF, GB, SVM, LR, DT, and GNB respectively

    Software Reliability Prediction using Correlation Constrained Multi-Objective Evolutionary Optimization Algorithm

    Get PDF
    Software reliability frameworks are extremely effective for estimating the probability of software failure over time. Numerous approaches for predicting software dependability were presented, but neither of those has shown to be effective. Predicting the number of software faults throughout the research and testing phases is a serious problem. As there are several software metrics such as object-oriented design metrics, public and private attributes, methods, previous bug metrics, and software change metrics. Many researchers have identified and performed predictions of software reliability on these metrics. But none of them contributed to identifying relations among these metrics and exploring the most optimal metrics. Therefore, this paper proposed a correlation- constrained multi-objective evolutionary optimization algorithm (CCMOEO) for software reliability prediction. CCMOEO is an effective optimization approach for estimating the variables of popular growth models which consists of reliability. To obtain the highest classification effectiveness, the suggested CCMOEO approach overcomes modeling uncertainties by integrating various metrics with multiple objective functions. The hypothesized models were formulated using evaluation results on five distinct datasets in this research. The prediction was evaluated on seven different machine learning algorithms i.e., linear support vector machine (LSVM), radial support vector machine (RSVM), decision tree, random forest, gradient boosting, k-nearest neighbor, and linear regression. The result analysis shows that random forest achieved better performance

    A modified mayfly-SVM approach for early detection of type 2 diabetes mellitus

    Get PDF
    Diabetes mellitus is a chronic disease that affects many people in the world badly. Early diagnosis of this disease is of paramount importance as physicians and patients can work towards prevention and mitigation of future complications. Hence, there is a necessity to develop a system that diagnoses type 2 diabetes mellitus (T2DM) at an early stage. Recently, large number of studies have emerged with prediction models to diagnose T2DM. Most importantly, published literature lacks the availability of multi-class studies. Therefore, the primary objective of the study is development of multi-class predictive model by taking advantage of routinely available clinical data in diagnosing T2DM using machine learning algorithms. In this work, modified mayfly-support vector machine is implemented to notice the prediabetic stage accurately. To assess the effectiveness of proposed model, a comparative study was undertaken and was contrasted with T2DM prediction models developed by other researchers from last five years. Proposed model was validated over data collected from local hospitals and the benchmark PIMA dataset available on UCI repository. The study reveals that modified Mayfly-SVM has a considerable edge over metaheuristic optimization algorithms in local as well as global searching capabilities and has attained maximum test accuracy of 94.5% over PIMA
    corecore