2,536 research outputs found

    Applications of Nature-Inspired Algorithms for Dimension Reduction: Enabling Efficient Data Analytics

    Get PDF
    In [1], we have explored the theoretical aspects of feature selection and evolutionary algorithms. In this chapter, we focus on optimization algorithms for enhancing data analytic process, i.e., we propose to explore applications of nature-inspired algorithms in data science. Feature selection optimization is a hybrid approach leveraging feature selection techniques and evolutionary algorithms process to optimize the selected features. Prior works solve this problem iteratively to converge to an optimal feature subset. Feature selection optimization is a non-specific domain approach. Data scientists mainly attempt to find an advanced way to analyze data n with high computational efficiency and low time complexity, leading to efficient data analytics. Thus, by increasing generated/measured/sensed data from various sources, analysis, manipulation and illustration of data grow exponentially. Due to the large scale data sets, Curse of dimensionality (CoD) is one of the NP-hard problems in data science. Hence, several efforts have been focused on leveraging evolutionary algorithms (EAs) to address the complex issues in large scale data analytics problems. Dimension reduction, together with EAs, lends itself to solve CoD and solve complex problems, in terms of time complexity, efficiently. In this chapter, we first provide a brief overview of previous studies that focused on solving CoD using feature extraction optimization process. We then discuss practical examples of research studies are successfully tackled some application domains, such as image processing, sentiment analysis, network traffics / anomalies analysis, credit score analysis and other benchmark functions/data sets analysis

    An improved moth flame optimization algorithm based on rough sets for tomato diseases detection

    Get PDF
    Plant diseases is one of the major bottlenecks in agricultural production that have bad effects on the economic of any country. Automatic detection of such disease could minimize these effects. Features selection is a usual pre-processing step used for automatic disease detection systems. It is an important process for detecting and eliminating noisy, irrelevant, and redundant data. Thus, it could lead to improve the detection performance. In this paper, an improved moth-flame approach to automatically detect tomato diseases was proposed. The moth-flame fitness function depends on the rough sets dependency degree and it takes into a consideration the number of selected features. The proposed algorithm used both of the power of exploration of the moth flame and the high performance of rough sets for the feature selection task to find the set of features maximizing the classification accuracy which was evaluated using the support vector machine (SVM). The performance of the MFORSFS algorithm was evaluated using many benchmark datasets taken from UCI machine learning data repository and then compared with feature selection approaches based on Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) with rough sets. The proposed algorithm was then used in a real-life problem, detecting tomato diseases (Powdery mildew and early blight) where a real dataset of tomato disease were manually built and a tomato disease detection approach was proposed and evaluated using this dataset. The experimental results showed that the proposed algorithm was efficient in terms of Recall, Precision, Accuracy and F-Score, as long as feature size reduction and execution time

    Crow search algorithm with time varying flight length strategies for feature selection

    Get PDF
    Feature Selection (FS) is an efficient technique use to get rid of irrelevant, redundant and noisy attributes in high dimensional datasets while increasing the efficacy of machine learning classification. The CSA is a modest and efficient metaheuristic algorithm which has been used to overcome several FS issues. The flight length (fl) parameter in CSA governs crows\u27 search ability. In CSA, fl is set to a fixed value. As a result, the CSA is plagued by the problem of being hoodwinked in local minimum. This article suggests a remedy to this issue by bringing five new concepts of time dependent fl in CSA for feature selection methods including linearly decreasing flight length, sigmoid decreasing flight length, chaotic decreasing flight length, simulated annealing decreasing flight length, and logarithm decreasing flight length. The proposed approaches\u27 performance is assessed using 13 standard UCI datasets. The simulation result portrays that the suggested feature selection approaches overtake the original CSA, with the chaotic-CSA approach beating the original CSA and the other four proposed approaches for the FS task

    Revisión sobre diferentes métodos de optimización basados en rough set theory

    Get PDF
    En los años recientes se han publicado diversos artículos relacionados con la técnica de optimización llamada: Conjuntos aproximativos o Rough Set Theory (RST) en sus diversos usos y aplicaciones. En este trabajo se realizó una revisión de diferentes artículos publicados entre los años del 2010 al 2016, sobre los diferentes métodos de optimización que utilizan RST, teoría de conjuntos difusos Fuzzy Sets (FS) y teoría de conjuntos suaves Soft Sets (SS). La revisión consideró las técnicas utilizadas, además, en cuáles han sido implementadas y las tendencias donde la metodología será utilizada en futuras investigaciones y publicaciones; haciendo énfasis en la optimización de búsquedas en sus diferentes tipos, mejoras en la obtención de resultados y reducción de atributos o tiempos de respuesta. La consulta se realizó en bases de datos científicas relacionados con RST, FS y SS, obteniendo 58 artículos base, donde se clasificaron y agruparon según la técnica utilizada. Se establece que el RST es una metodología bastante utilizada en diferentes áreas y procesos, validando que es una técnica útil para diferentes aplicaciones como la toma de decisiones, la minería de datos y predicciones, entre otros. Además, se encontró que es un tópico que está atrayendo la atención en diversas investigaciones como también que el RST asociado con algoritmos basados en el comportamiento de la naturaleza (BioMetainspirados), está tornándose en una gran tendencia, abriendo el campo de acción en alternativas de investigación con respecto a la optimización. De la información recolectada, se estableció un análisis comparativo de uso entre las diferentes técnicas encontradas y que interactúan con RST. A su vez, se resalta la capacidad de la teoría y su versatilidad para combinarse con diferentes técnicas y así aplicarse e implementarse en diversos procesos de optimización como se observará en el presente documento.Ingeniero de Sistemaspregrad

    Evolutionary Computation, Optimization and Learning Algorithms for Data Science

    Get PDF
    A large number of engineering, science and computational problems have yet to be solved in a computationally efficient way. One of the emerging challenges is how evolving technologies grow towards autonomy and intelligent decision making. This leads to collection of large amounts of data from various sensing and measurement technologies, e.g., cameras, smart phones, health sensors, smart electricity meters, and environment sensors. Hence, it is imperative to develop efficient algorithms for generation, analysis, classification, and illustration of data. Meanwhile, data is structured purposefully through different representations, such as large-scale networks and graphs. We focus on data science as a crucial area, specifically focusing on a curse of dimensionality (CoD) which is due to the large amount of generated/sensed/collected data. This motivates researchers to think about optimization and to apply nature-inspired algorithms, such as evolutionary algorithms (EAs) to solve optimization problems. Although these algorithms look un-deterministic, they are robust enough to reach an optimal solution. Researchers do not adopt evolutionary algorithms unless they face a problem which is suffering from placement in local optimal solution, rather than global optimal solution. In this chapter, we first develop a clear and formal definition of the CoD problem, next we focus on feature extraction techniques and categories, then we provide a general overview of meta-heuristic algorithms, its terminology, and desirable properties of evolutionary algorithms

    Binary Multi-Verse Optimization (BMVO) Approaches for Feature Selection

    Get PDF
    Multi-Verse Optimization (MVO) is one of the newest meta-heuristic optimization algorithms which imitates the theory of Multi-Verse in Physics and resembles the interaction among the various universes. In problem domains like feature selection, the solutions are often constrained to the binary values viz. 0 and 1. With regard to this, in this paper, binary versions of MVO algorithm have been proposed with two prime aims: firstly, to remove redundant and irrelevant features from the dataset and secondly, to achieve better classification accuracy. The proposed binary versions use the concept of transformation functions for the mapping of a continuous version of the MVO algorithm to its binary versions. For carrying out the experiments, 21 diverse datasets have been used to compare the Binary MVO (BMVO) with some binary versions of existing metaheuristic algorithms. It has been observed that the proposed BMVO approaches have outperformed in terms of a number of features selected and the accuracy of the classification process

    A novel approach for estimation of above-ground biomass of sugar beet based on wavelength selection and optimized support vector machine

    Get PDF
    Timely diagnosis of sugar beet above-ground biomass (AGB) is critical for the prediction of yield and optimal precision crop management. This study established an optimal quantitative prediction model of AGB of sugar beet by using hyperspectral data. Three experiment campaigns in 2014, 2015 and 2018 were conducted to collect ground-based hyperspectral data at three different growth stages, across different sites, for different cultivars and nitrogen (N) application rates. A competitive adaptive reweighted sampling (CARS) algorithm was applied to select the most sensitive wavelengths to AGB. This was followed by developing a novel modified differential evolution grey wolf optimization algorithm (MDE-GWO) by introducing differential evolution algorithm (DE) and dynamic non-linear convergence factor to grey wolf optimization algorithm (GWO) to optimize the parameters c and gamma of a support vector machine (SVM) model for the prediction of AGB. The prediction performance of SVM models under the three GWO, DE-GWO and MDE-GWO optimization methods for CARS selected wavelengths and whole spectral data was examined. Results showed that CARS resulted in a huge wavelength reduction of 97.4% for the rapid growth stage of leaf cluster, 97.2% for the sugar growth stage and 97.4% for the sugar accumulation stage. Models resulted after CARS wavelength selection were found to be more accurate than models developed using the entire spectral data. The best prediction accuracy was achieved after the MDE-GWO optimization of SVM model parameters for the prediction of AGB in sugar beet, independent of growing stage, years, sites and cultivars. The best coefficient of determination (R-2), root mean square error (RMSE) and residual prediction deviation (RPD) ranged, respectively, from 0.74 to 0.80, 46.17 to 65.68 g/m(2) and 1.42 to 1.97 for the rapid growth stage of leaf cluster, 0.78 to 0.80, 30.16 to 37.03 g/m(2) and 1.69 to 2.03 for the sugar growth stage, and 0.69 to 0.74, 40.17 to 104.08 g/m(2) and 1.61 to 1.95 for the sugar accumulation stage. It can be concluded that the methodology proposed can be implemented for the prediction of AGB of sugar beet using proximal hyperspectral sensors under a wide range of environmental conditions

    Enhanced grey wolf optimisation algorithm for feature selection in anomaly detection

    Get PDF
    Anomaly detection deals with identification of items that do not conform to an expected pattern or items present in a dataset. The performance of different mechanisms utilized to perform the anomaly detection depends heavily on the group of features used. Thus, not all features in the dataset can be used in the classification process since some features may lead to low performance of classifier. Feature selection (FS) is a good mechanism that minimises the dimension of high-dimensional datasets by deleting the irrelevant features. Modified Binary Grey Wolf Optimiser (MBGWO) is a modern metaheuristic algorithm that has successfully been used for FS for anomaly detection. However, the MBGWO has several issues in finding a good quality solution. Thus, this study proposes an enhanced binary grey wolf optimiser (EBGWO) algorithm for FS in anomaly detection to overcome the algorithm issues. The first modification enhances the initial population of the MBGWO using a heuristic based Ant Colony Optimisation algorithm. The second modification develops a new position update mechanism using the Bat Algorithm movement. The third modification improves the controlled parameter of the MBGWO algorithm using indicators from the search process to refine the solution. The EBGWO algorithm was evaluated on NSL-KDD and six (6) benchmark datasets from the University California Irvine (UCI) repository against ten (10) benchmark metaheuristic algorithms. Experimental results of the EBGWO algorithm on the NSL-KDD dataset in terms of number of selected features and classification accuracy are superior to other benchmark optimisation algorithms. Moreover, experiments on the six (6) UCI datasets showed that the EBGWO algorithm is superior to the benchmark algorithms in terms of classification accuracy and second best for the number of selected features. The proposed EBGWO algorithm can be used for FS in anomaly detection tasks that involve any dataset size from various application domains
    • …
    corecore