4,272 research outputs found

    A novel penalty-based wrapper objective function for feature selection in big data using cooperative co-evolution

    Get PDF
    The rapid progress of modern technologies generates a massive amount of high-throughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (also called attributes); however, not all these are necessary or relevant, and they may degrade the performance of ML algorithms. Feature selection (FS) is an essential preprocessing step to reduce the dimensionality of a dataset. Evolutionary algorithms (EAs) are widely used search algorithms for FS. Using classification accuracy as the objective function for FS, EAs, such as the cooperative co-evolutionary algorithm (CCEA), achieve higher accuracy, even with a higher number of features. Feature selection has two purposes: reducing the number of features to decrease computations and improving classification accuracy, which are contradictory but can be achieved using a single objective function. For this very purpose, this paper proposes a penalty-based wrapper objective function. This function can be used to evaluate the FS process using CCEA, hence called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). An experiment was performed using six widely used classifiers on six different datasets from the UCI ML repository with FS and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Based on different performance measures, in most cases, naïve Bayes outperforms other classifiers when using CCEAFS

    Optimal Fuzzy Model Construction with Statistical Information using Genetic Algorithm

    Full text link
    Fuzzy rule based models have a capability to approximate any continuous function to any degree of accuracy on a compact domain. The majority of FLC design process relies on heuristic knowledge of experience operators. In order to make the design process automatic we present a genetic approach to learn fuzzy rules as well as membership function parameters. Moreover, several statistical information criteria such as the Akaike information criterion (AIC), the Bhansali-Downham information criterion (BDIC), and the Schwarz-Rissanen information criterion (SRIC) are used to construct optimal fuzzy models by reducing fuzzy rules. A genetic scheme is used to design Takagi-Sugeno-Kang (TSK) model for identification of the antecedent rule parameters and the identification of the consequent parameters. Computer simulations are presented confirming the performance of the constructed fuzzy logic controller

    Solving the G-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control

    Get PDF
    Constrained optimization of high-dimensional numerical problems plays an important role in many scientific and industrial applications. Function evaluations in many industrial applications are severely limited and no analytical information about objective function and constraint functions is available. For such expensive black-box optimization tasks, the constraint optimization algorithm COBRA was proposed, making use of RBF surrogate modeling for both the objective and the constraint functions. COBRA has shown remarkable success in solving reliably complex benchmark problems in less than 500 function evaluations. Unfortunately, COBRA requires careful adjustment of parameters in order to do so. In this work we present a new self-adjusting algorithm SACOBRA, which is based on COBRA and capable to achieve high-quality results with very few function evaluations and no parameter tuning. It is shown with the help of performance profiles on a set of benchmark problems (G-problems, MOPTA08) that SACOBRA consistently outperforms any COBRA algorithm with fixed parameter setting. We analyze the importance of the several new elements in SACOBRA and find that each element of SACOBRA plays a role to boost up the overall optimization performance. We discuss the reasons behind and get in this way a better understanding of high-quality RBF surrogate modeling

    TEDA: A Targeted Estimation of Distribution Algorithm

    Get PDF
    This thesis discusses the development and performance of a novel evolutionary algorithm, the Targeted Estimation of Distribution Algorithm (TEDA). TEDA takes the concept of targeting, an idea that has previously been shown to be effective as part of a Genetic Algorithm (GA) called Fitness Directed Crossover (FDC), and introduces it into a novel hybrid algorithm that transitions from a GA to an Estimation of Distribution Algorithm (EDA). Targeting is a process for solving optimisation problems where there is a concept of control points, genes that can be said to be active, and where the total number of control points found within a solution is as important as where they are located. When generating a new solution an algorithm that uses targeting must first of all choose the number of control points to set in the new solution before choosing which to set. The hybrid approach is designed to take advantage of the ability of EDAs to exploit patterns within the population to effectively locate the global optimum while avoiding the tendency of EDAs to prematurely converge. This is achieved by initially using a GA to effectively explore the search space before transitioning into an EDA as the population converges on the region of the global optimum. As targeting places an extra restriction on the solutions produced by specifying their size, combining it with the hybrid approach allows TEDA to produce solutions that are of an optimal size and of a higher quality than would be found using a GA alone without risking a loss of diversity. TEDA is tested on three different problem domains. These are optimal control of cancer chemotherapy, network routing and Feature Subset Selection (FSS). Of these problems, TEDA showed consistent advantage over standard EAs in the routing problem and demonstrated that it is able to find good solutions faster than untargeted EAs and non evolutionary approaches at the FSS problem. It did not demonstrate any advantage over other approaches when applied to chemotherapy. The FSS domain demonstrated that in large and noisy problems TEDA’s targeting derived ability to reduce the size of the search space significantly increased the speed with which good solutions could be found. The routing domain demonstrated that, where the ideal number of control points is deceptive, both targeting and the exploitative capabilities of an EDA are needed, making TEDA a more effective approach than both untargeted approaches and FDC. Additionally, in none of the problems was TEDA seen to perform significantly worse than any alternative approaches

    Variational Autoencoder Based Estimation Of Distribution Algorithms And Applications To Individual Based Ecosystem Modeling Using EcoSim

    Get PDF
    Individual based modeling provides a bottom up approach wherein interactions give rise to high-level phenomena in patterns equivalent to those found in nature. This method generates an immense amount of data through artificial simulation and can be made tractable by machine learning where multidimensional data is optimized and transformed. Using individual based modeling platform known as EcoSim, we modeled the abilities of elitist sexual selection and communication of fear. Data received from these experiments was reduced in dimension through use of a novel algorithm proposed by us: Variational Autoencoder based Estimation of Distribution Algorithms with Population Queue and Adaptive Variance Scaling (VAE-EDA-Q AVS). We constructed a novel Estimation of Distribution Algorithm (EDA) by extending generative models known as variational autoencoders (VAE). VAE-EDA-Q, proposed by us, smooths the data generation process using an iteratively updated queue (Q) of populations. Adaptive Variance Scaling (AVS) dynamically updates the variance at which models are sampled based on fitness. The combination of VAE-EDA-Q with AVS demonstrates high computational efficiency and requires few fitness evaluations. We extended VAE-EDA-Q AVS to act as a feature reducing wrapper method in conjunction with C4.5 Decision trees to reduce the dimensionality of data. The relationship between sexual selection, random selection, and speciation is a contested topic. Supporting evidence suggests sexual selection to drive speciation. Opposing evidence contends either a negative or absence of correlation to exist. We utilized EcoSim to model elitist and random mate selection. Our results demonstrated a significantly lower speciation rate, a significantly lower extinction rate, and a significantly higher turnover rate for sexual selection groups. Species diversification was found to display no significant difference. The relationship between communication and foraging behavior similarly features opposing hypotheses in claim of both increases and decreases of foraging behavior in response to alarm communication. Through modeling with EcoSim, we found alarm communication to decrease foraging activity in most cases, yet gradually increase foraging activity in some other cases. Furthermore, we found both outcomes resulting from alarm communication to increase fitness as compared to non-communication
    corecore