231 research outputs found

    An Effective Ensemble Approach for Spam Classification

    Get PDF
    The annoyance of spam increasingly plagues both individuals and organizations. Spam classification is an important issue to distinguish the spam with the legitimate email or address. This paper presents a neural network ensemble approach based on a specially designed cooperative coevolution paradigm. Each component network corresponds to a separate subpopulation and all subpopulations are evolved simultaneously. The ensemble performance and the Q-statistic diversity measure are adopted as the objectives, and the component networks are evaluated by using the multi-objective Pareto optimality measure. Experimental results illustrate that the proposed algorithm outperforms the traditional ensemble methods on the spam classification problems

    CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction

    Full text link
    Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code

    Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification

    Get PDF
    We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Coevolutionary fuzzy modeling

    Get PDF
    This thesis presents Fuzzy CoCo, a novel approach for system design, conducive to explaining human decisions. Based on fuzzy logic and coevolutionary computation, Fuzzy CoCo is a methodology for constructing systems able to accurately predict the outcome of a human decision-making process, while providing an understandable explanation of the underlying reasoning. Fuzzy logic provides a formal framework for constructing systems exhibiting both good numeric performance (precision) and linguistic representation (interpretability). From a numeric point of view, fuzzy systems exhibit nonlinear behavior and can handle imprecise and incomplete information. Linguistically, they represent knowledge in the form of rules, a natural way for explaining decision processes. Fuzzy modeling —meaning the construction of fuzzy systems— is an arduous task, demanding the identification of many parameters. This thesis analyses the fuzzy-modeling problem and different approaches to coping with it, focusing on evolutionary fuzzy modeling —the design of fuzzy inference systems using evolutionary algorithms— which constitutes the methodological base of my approach. In order to promote this analysis the parameters of a fuzzy system are classified into four categories: logic, structural, connective, and operational. The central contribution of this work is the use of an advanced evolutionary technique —cooperative coevolution— for dealing with the simultaneous design of connective and operational parameters. Cooperative coevolutionary fuzzy modeling succeeds in overcoming several limitations exhibited by other standard evolutionary approaches: stagnation, convergence to local optima, and computational costliness. Designing interpretable systems is a prime goal of my approach, which I study thoroughly herein. Based on a set of semantic and syntactic criteria, regarding the definition of linguistic concepts and their causal connections, I propose a number of strategies for producing more interpretable fuzzy systems. These strategies are implemented in Fuzzy CoCo, resulting in a modeling methodology providing high numeric precision, while incurring as little a loss of interpretability as possible. After testing Fuzzy CoCo on a benchmark problem —Fisher's Iris data— I successfully apply the algorithm to model the decision processes involved in two breast-cancer diagnostic problems: the WBCD problem and the Catalonia mammography interpretation problem. For the WBCD problem, Fuzzy CoCo produces systems both of high performance and high interpretability, comparable (if not better) than the best systems demonstrated to date. For the Catalonia problem, an evolved high-performance system was embedded within a web-based tool —called COBRA— for aiding radiologists in mammography interpretation. Several aspects of Fuzzy CoCo are thoroughly analyzed to provide a deeper understanding of the method. These analyses show the consistency of the results. They also help derive a stepwise guide to applying Fuzzy CoCo, and a set of qualitative relationships between some of its parameters that facilitate setting up the algorithm. Finally, this work proposes and explores preliminarily two extensions to the method: Island Fuzzy CoCo and Incremental Fuzzy CoCo, which together with the original CoCo constitute a family of coevolutionary fuzzy modeling techniques. The aim of these extensions is to guide the choice of an adequate number of rules for a given problem. While Island Fuzzy CoCo performs an extended search over different problem sizes, Incremental Fuzzy CoCo bases its search power on a mechanism of incremental evolution
    • …
    corecore