231 research outputs found
An Effective Ensemble Approach for Spam Classification
The annoyance of spam increasingly plagues both individuals and organizations. Spam classification is an important issue to distinguish the spam with the legitimate email or address. This paper presents a neural network ensemble approach based on a specially designed cooperative coevolution paradigm. Each component network corresponds to a separate subpopulation and all subpopulations are evolved simultaneously. The ensemble performance and the Q-statistic diversity measure are adopted as the objectives, and the component networks are evaluated by using the multi-objective Pareto optimality measure. Experimental results illustrate that the proposed algorithm outperforms the traditional ensemble methods on the spam classification problems
CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction
Protein-protein interactions are essential in biochemical processes. Accurate
prediction of the protein-protein interaction sites (PPIs) deepens our
understanding of biological mechanism and is crucial for new drug design.
However, conventional experimental methods for PPIs prediction are costly and
time-consuming so that many computational approaches, especially ML-based
methods, have been developed recently. Although these approaches have achieved
gratifying results, there are still two limitations: (1) Most models have
excavated some useful input features, but failed to take coevolutionary
features into account, which could provide clues for inter-residue
relationships; (2) The attention-based models only allocate attention weights
for neighboring residues, instead of doing it globally, neglecting that some
residues being far away from the target residues might also matter.
We propose a coevolution-enhanced global attention neural network, a
sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It
utilizes three layers in parallel for feature extraction: (1) Local-level
representation aggregation layer, which aggregates the neighboring residues'
features; (2) Global-level representation learning layer, which employs a novel
coevolution-enhanced global attention mechanism to allocate attention weights
to all the residues on the same protein sequences; (3) Coevolutionary
information learning layer, which applies CNN & pooling to coevolutionary
information to obtain the coevolutionary profile representation. Then, the
three outputs are concatenated and passed into several fully connected layers
for the final prediction. Application on two benchmark datasets demonstrated a
state-of-the-art performance of our model. The source code is publicly
available at https://github.com/Slam1423/CoGANPPIS_source_code
Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification
We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset
Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives
The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions
Coevolutionary fuzzy modeling
This thesis presents Fuzzy CoCo, a novel approach for system design, conducive to explaining human decisions. Based on fuzzy logic and coevolutionary computation, Fuzzy CoCo is a methodology for constructing systems able to accurately predict the outcome of a human decision-making process, while providing an understandable explanation of the underlying reasoning. Fuzzy logic provides a formal framework for constructing systems exhibiting both good numeric performance (precision) and linguistic representation (interpretability). From a numeric point of view, fuzzy systems exhibit nonlinear behavior and can handle imprecise and incomplete information. Linguistically, they represent knowledge in the form of rules, a natural way for explaining decision processes. Fuzzy modeling —meaning the construction of fuzzy systems— is an arduous task, demanding the identification of many parameters. This thesis analyses the fuzzy-modeling problem and different approaches to coping with it, focusing on evolutionary fuzzy modeling —the design of fuzzy inference systems using evolutionary algorithms— which constitutes the methodological base of my approach. In order to promote this analysis the parameters of a fuzzy system are classified into four categories: logic, structural, connective, and operational. The central contribution of this work is the use of an advanced evolutionary technique —cooperative coevolution— for dealing with the simultaneous design of connective and operational parameters. Cooperative coevolutionary fuzzy modeling succeeds in overcoming several limitations exhibited by other standard evolutionary approaches: stagnation, convergence to local optima, and computational costliness. Designing interpretable systems is a prime goal of my approach, which I study thoroughly herein. Based on a set of semantic and syntactic criteria, regarding the definition of linguistic concepts and their causal connections, I propose a number of strategies for producing more interpretable fuzzy systems. These strategies are implemented in Fuzzy CoCo, resulting in a modeling methodology providing high numeric precision, while incurring as little a loss of interpretability as possible. After testing Fuzzy CoCo on a benchmark problem —Fisher's Iris data— I successfully apply the algorithm to model the decision processes involved in two breast-cancer diagnostic problems: the WBCD problem and the Catalonia mammography interpretation problem. For the WBCD problem, Fuzzy CoCo produces systems both of high performance and high interpretability, comparable (if not better) than the best systems demonstrated to date. For the Catalonia problem, an evolved high-performance system was embedded within a web-based tool —called COBRA— for aiding radiologists in mammography interpretation. Several aspects of Fuzzy CoCo are thoroughly analyzed to provide a deeper understanding of the method. These analyses show the consistency of the results. They also help derive a stepwise guide to applying Fuzzy CoCo, and a set of qualitative relationships between some of its parameters that facilitate setting up the algorithm. Finally, this work proposes and explores preliminarily two extensions to the method: Island Fuzzy CoCo and Incremental Fuzzy CoCo, which together with the original CoCo constitute a family of coevolutionary fuzzy modeling techniques. The aim of these extensions is to guide the choice of an adequate number of rules for a given problem. While Island Fuzzy CoCo performs an extended search over different problem sizes, Incremental Fuzzy CoCo bases its search power on a mechanism of incremental evolution
- …