47 research outputs found

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Hybrid ACO and SVM algorithm for pattern classification

    Get PDF
    Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO

    Aco-based feature selection algorithm for classification

    Get PDF
    Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

    Evolutionary Computation 2020

    Get PDF
    Intelligent optimization is based on the mechanism of computational intelligence to refine a suitable feature model, design an effective optimization algorithm, and then to obtain an optimal or satisfactory solution to a complex problem. Intelligent algorithms are key tools to ensure global optimization quality, fast optimization efficiency and robust optimization performance. Intelligent optimization algorithms have been studied by many researchers, leading to improvements in the performance of algorithms such as the evolutionary algorithm, whale optimization algorithm, differential evolution algorithm, and particle swarm optimization. Studies in this arena have also resulted in breakthroughs in solving complex problems including the green shop scheduling problem, the severe nonlinear problem in one-dimensional geodesic electromagnetic inversion, error and bug finding problem in software, the 0-1 backpack problem, traveler problem, and logistics distribution center siting problem. The editors are confident that this book can open a new avenue for further improvement and discoveries in the area of intelligent algorithms. The book is a valuable resource for researchers interested in understanding the principles and design of intelligent algorithms

    Information gain directed genetic algorithm wrapper feature selection for credit rating

    Get PDF
    Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    Algorithms and Software for Biological MP Modeling by Statistical and Optimization Techniques

    Get PDF
    I sistemi biologici sono gruppi di entit\ue0 biologiche (es. molecole ed organismi), che interagiscono producendo specifiche dinamiche. Questi sistemi sono solitamente caratterizzati da una elevata complessit\ue0 perch\ue8 coinvolgono un elevato numero di componenti con molte interconnessioni. La comprensione dei meccanismi che governano i sistemi biologici e la previsione dei loro comportamenti in condizioni normali e patologiche \ue8 una sfida cruciale della biologia dei sistemi (in inglese detta systems biology), un'area di ricerca al confine tra biologia, medicina, matematica ed informatica. In questa tesi i P sistemi metabolici, detti brevemente sistemi MP, sono stati utilizzati come modello discreto per l'analisi di dinamiche biologiche. Essi sono una classe deterministica dei P sistemi classici, che utilizzano regole di riscrittura per rappresentare le reazioni chimiche e "funzioni di regolazioni di flusso" per regolare la reattivit\ue0 di ciascuna reazione rispetto alla quantita' di sostanze presenti istantaneamente nel sistema. Dopo un excursus sulla letteratura relativa ad alcuni modelli convenzionali (come le equazioni differenziali ed i modelli stocastici proposti da Gillespie) e non-convenzionali (come i P sistemi ed i P sistemi metabolici), saranno presentati i risultati della mia ricerca. Essi riguardano tre argomenti principali: i) l'equivalenza tra sistemi MP e reti di Petri ibride funzionali, ii) le prospettive statistiche e di ottimizzazione nella generazione di sistemi MP a partire da dati sperimentali, iii) lo sviluppo di un laboratorio virtuale chiamato MetaPlab, un software Java basato sui sistemi MP. L'equivalenza tra i sistemi MP e le reti di Petri ibride funzionali \ue8 stata dimostrata per mezzo di due teoremi ed alcuni esperimenti al computer per il caso di studio del meccanismo regolativo del gene operone lac nella pathway glicolitica. Il secondo argomento di ricerca concerne nuovi approcci per la sintesi delle funzioni di regolazione di flusso. La regressione stepwise e le reti neurali sono state impiegate come approssimatori di funzioni, mentre algoritmi di ottimizzazione classici ed evolutivi (es. backpropagation, algoritmi genetici, particle swarm optimization ed algoritmi memetici) sono stati impiegati per l'addestramento dei modelli. Una completo workflow per l'analisi dei dati sperimentali \ue8 stato presentato. Esso gestisce ed indirizza l'intero processo di sintesi delle funzioni di regolazione, dalla preparazione dei dati alla selezione delle variabili, fino alla generazione dei modelli ed alla loro validazione. Le metodologie proposte sono state testate con successo tramite esperimenti al computer sui casi di studio dell'oscillatore mitotico negli embrioni anfibi e del non photochemical quenching (NPQ). L'ultimo tema di ricerca \ue8 infine piu' applicativo e riguarda la progettazione e lo sviluppo di una architettura Java basata su plugin e di una serie di plugin che consentono di automatizzare varie fasi del processo di modellazione con sistemi MP, come la simulazione di dinamiche, la determinazione dei flussi e la generazione delle funzioni di regolazione.Biological systems are groups of biological entities, (e.g., molecules and organisms), that interact together producing specific dynamics. These systems are usually characterized by a high complexity, since they involve a large number of components having many interconnections. Understanding biological system mechanisms, and predicting their behaviors in normal and pathological conditions is a crucial challenge in systems biology, which is a central research area on the border among biology, medicine, mathematics and computer science. In this thesis metabolic P systems, also called MP systems, have been employed as discrete modeling framework for the analysis of biological system dynamics. They are a deterministic class of P systems employing rewriting rules to represent chemical reactions and "flux regulation functions" to tune reactions reactivity according to the amount of substances present in the system. After an excursus on the literature about some conventional (i.e., differential equations, Gillespie's models) and unconventional (i.e., P systems and metabolic P systems) modeling frameworks, the results of my research are presented. They concern three research topics: i) equivalences between MP systems and hybrid functional Petri nets, ii) statistical and optimization perspectives in the generation of MP models from experimental data, iii) development of the virtual laboratory MetaPlab, a Java software based on MP systems. The equivalence between MP systems and hybrid functional Petri nets is proved by two theorems and some in silico experiments for the case study of the lac operon gene regulatory mechanism and glycolytic pathway. The second topic concerns new approaches to the synthesis of flux regulation functions. Stepwise linear regression and neural networks are employed as function approximators, and classical/evolutionary optimization algorithms (e.g., backpropagation, genetic algorithms, particle swarm optimization, memetic algorithms) as learning techniques. A complete pipeline for data analysis is also presented, which addresses the entire process of flux regulation function synthesis, from data preparation to feature selection, model generation and statistical validation. The proposed methodologies have been successfully tested by means of in silico experiments on the mitotic oscillator in early amphibian embryos and the non photochemical quenching (NPQ). The last research topic is more applicative, and pertains the design and development of a Java plugin architecture and several plugins which enable to automatize many tasks related to MP modeling, such as, dynamics computation, flux discovery, and regulation function synthesis

    Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications

    Get PDF
    The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin

    Evolutionary Computation

    Get PDF
    This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC
    corecore