4,272 research outputs found
A novel penalty-based wrapper objective function for feature selection in big data using cooperative co-evolution
The rapid progress of modern technologies generates a massive amount of high-throughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (also called attributes); however, not all these are necessary or relevant, and they may degrade the performance of ML algorithms. Feature selection (FS) is an essential preprocessing step to reduce the dimensionality of a dataset. Evolutionary algorithms (EAs) are widely used search algorithms for FS. Using classification accuracy as the objective function for FS, EAs, such as the cooperative co-evolutionary algorithm (CCEA), achieve higher accuracy, even with a higher number of features. Feature selection has two purposes: reducing the number of features to decrease computations and improving classification accuracy, which are contradictory but can be achieved using a single objective function. For this very purpose, this paper proposes a penalty-based wrapper objective function. This function can be used to evaluate the FS process using CCEA, hence called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). An experiment was performed using six widely used classifiers on six different datasets from the UCI ML repository with FS and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Based on different performance measures, in most cases, naïve Bayes outperforms other classifiers when using CCEAFS
Optimal Fuzzy Model Construction with Statistical Information using Genetic Algorithm
Fuzzy rule based models have a capability to approximate any continuous
function to any degree of accuracy on a compact domain. The majority of FLC
design process relies on heuristic knowledge of experience operators. In order
to make the design process automatic we present a genetic approach to learn
fuzzy rules as well as membership function parameters. Moreover, several
statistical information criteria such as the Akaike information criterion
(AIC), the Bhansali-Downham information criterion (BDIC), and the
Schwarz-Rissanen information criterion (SRIC) are used to construct optimal
fuzzy models by reducing fuzzy rules. A genetic scheme is used to design
Takagi-Sugeno-Kang (TSK) model for identification of the antecedent rule
parameters and the identification of the consequent parameters. Computer
simulations are presented confirming the performance of the constructed fuzzy
logic controller
Solving the G-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control
Constrained optimization of high-dimensional numerical problems plays an
important role in many scientific and industrial applications. Function
evaluations in many industrial applications are severely limited and no
analytical information about objective function and constraint functions is
available. For such expensive black-box optimization tasks, the constraint
optimization algorithm COBRA was proposed, making use of RBF surrogate modeling
for both the objective and the constraint functions. COBRA has shown remarkable
success in solving reliably complex benchmark problems in less than 500
function evaluations. Unfortunately, COBRA requires careful adjustment of
parameters in order to do so.
In this work we present a new self-adjusting algorithm SACOBRA, which is
based on COBRA and capable to achieve high-quality results with very few
function evaluations and no parameter tuning. It is shown with the help of
performance profiles on a set of benchmark problems (G-problems, MOPTA08) that
SACOBRA consistently outperforms any COBRA algorithm with fixed parameter
setting. We analyze the importance of the several new elements in SACOBRA and
find that each element of SACOBRA plays a role to boost up the overall
optimization performance. We discuss the reasons behind and get in this way a
better understanding of high-quality RBF surrogate modeling
TEDA: A Targeted Estimation of Distribution Algorithm
This thesis discusses the development and performance of a novel evolutionary algorithm, the Targeted Estimation of Distribution Algorithm (TEDA). TEDA takes the concept of targeting, an idea that has previously been shown to be effective as part of a Genetic Algorithm (GA) called Fitness Directed Crossover (FDC), and introduces it into a novel hybrid algorithm that transitions from a GA to an Estimation of Distribution Algorithm (EDA).
Targeting is a process for solving optimisation problems where there is a concept of control points, genes that can be said to be active, and where the total number of control points found within a solution is as important as where they are located. When generating a new solution an algorithm that uses targeting must first of all choose the number of control points to set in the new solution before choosing which to set.
The hybrid approach is designed to take advantage of the ability of EDAs to exploit patterns within the population to effectively locate the global optimum while avoiding the tendency of EDAs to prematurely converge. This is achieved by initially using a GA to effectively explore the search space before transitioning into an EDA as the population converges on the region of the global optimum. As targeting places an extra restriction on the solutions produced by
specifying their size, combining it with the hybrid approach allows TEDA to produce solutions that are of an optimal size and of a higher quality than would be found using a GA alone without risking a loss of diversity.
TEDA is tested on three different problem domains. These are optimal control of cancer
chemotherapy, network routing and Feature Subset Selection (FSS). Of these problems, TEDA showed consistent advantage over standard EAs in the routing problem and demonstrated that it is able to find good solutions faster than untargeted EAs and non evolutionary approaches at the FSS problem. It did not demonstrate any advantage over other approaches when applied to chemotherapy.
The FSS domain demonstrated that in large and noisy problems TEDA’s targeting derived ability to reduce the size of the search space significantly increased the speed with which good solutions could be found. The routing domain demonstrated that, where the ideal number of control points is deceptive, both targeting and the exploitative capabilities of an EDA are needed, making TEDA a more effective approach than both untargeted approaches and FDC. Additionally, in none of the problems was TEDA seen to perform significantly worse than any alternative approaches
Variational Autoencoder Based Estimation Of Distribution Algorithms And Applications To Individual Based Ecosystem Modeling Using EcoSim
Individual based modeling provides a bottom up approach wherein interactions give rise to high-level phenomena in patterns equivalent to those found in nature. This method generates an immense amount of data through artificial simulation and can be made tractable by machine learning where multidimensional data is optimized and transformed. Using individual based modeling platform known as EcoSim, we modeled the abilities of elitist sexual selection and communication of fear. Data received from these experiments was reduced in dimension through use of a novel algorithm proposed by us: Variational Autoencoder based Estimation of Distribution Algorithms with Population Queue and Adaptive Variance Scaling (VAE-EDA-Q AVS). We constructed a novel Estimation of Distribution Algorithm (EDA) by extending generative models known as variational autoencoders (VAE). VAE-EDA-Q, proposed by us, smooths the data generation process using an iteratively updated queue (Q) of populations. Adaptive Variance Scaling (AVS) dynamically updates the variance at which models are sampled based on fitness. The combination of VAE-EDA-Q with AVS demonstrates high computational efficiency and requires few fitness evaluations. We extended VAE-EDA-Q AVS to act as a feature reducing wrapper method in conjunction with C4.5 Decision trees to reduce the dimensionality of data. The relationship between sexual selection, random selection, and speciation is a contested topic. Supporting evidence suggests sexual selection to drive speciation. Opposing evidence contends either a negative or absence of correlation to exist. We utilized EcoSim to model elitist and random mate selection. Our results demonstrated a significantly lower speciation rate, a significantly lower extinction rate, and a significantly higher turnover rate for sexual selection groups. Species diversification was found to display no significant difference. The relationship between communication and foraging behavior similarly features opposing hypotheses in claim of both increases and decreases of foraging behavior in response to alarm communication. Through modeling with EcoSim, we found alarm communication to decrease foraging activity in most cases, yet gradually increase foraging activity in some other cases. Furthermore, we found both outcomes resulting from alarm communication to increase fitness as compared to non-communication
- …