1,582 research outputs found

    Schema theory based data engineering in gene expression programming for big data analytics

    Get PDF
    Gene expression programming (GEP) is a data driven evolutionary technique that well suits for correlation mining. Parallel GEPs are proposed to speed up the evolution process using a cluster of computers or a computer with multiple CPU cores. However, the generation structure of chromosomes and the size of input data are two issues that tend to be neglected when speeding up GEP in evolution. To fill the research gap, this paper proposes three guiding principles to elaborate the computation nature of GEP in evolution based on an analysis of GEP schema theory. As a result, a novel data engineered GEP is developed which follows closely the generation structure of chromosomes in parallelization and considers the input data size in segmentation. Experimental results on two data sets with complementary features show that the data engineered GEP speeds up the evolution process significantly without loss of accuracy in data correlation mining. Based on the experimental tests, a computation model of the data engineered GEP is further developed to demonstrate its high scalability in dealing with potential big data using a large number of CPU cores

    Artificial immune systems based committee machine for classification application

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A new adaptive learning Artificial Immune System (AIS) based committee machine is developed in this thesis. The new proposed approach efficiently tackles the general problem of clustering high-dimensional data. In addition, it helps on deriving useful decision and results related to other application domains such classification and prediction. Artificial Immune System (AIS) is a branch of computational intelligence field inspired by the biological immune system, and has gained increasing interest among researchers in the development of immune-based models and techniques to solve diverse complex computational or engineering problems. This work presents some applications of AIS techniques to health problems, and a thorough survey of existing AIS models and algorithms. The main focus of this research is devoted to building an ensemble model integrating different AIS techniques (i.e. Artificial Immune Networks, Clonal Selection, and Negative Selection) for classification applications to achieve better classification results. A new AIS-based ensemble architecture with adaptive learning features is proposed by integrating different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the combination of these techniques. Various techniques related to the design and enhancements of the new adaptive learning architecture are studied, including a neuro-fuzzy based detector and an optimizer using particle swarm optimization method to achieve enhanced classification performance. An evaluation study was conducted to show the performance of the new proposed adaptive learning ensemble and to compare it to alternative combining techniques. Several experiments are presented using different medical datasets for the classification problem and findings and outcomes are discussed. The new adaptive learning architecture improves the accuracy of the ensemble. Moreover, there is an improvement over the existing aggregation techniques. The outcomes, assumptions and limitations of the proposed methods with its implications for further research in this area draw this research to its conclusion

    An Evolutionary Algorithm to Generate Ellipsoid Detectors for Negative Selection

    Get PDF
    Negative selection is a process from the biological immune system that can be applied to two-class (self and nonself) classification problems. Negative selection uses only one class (self) for training, which results in detectors for the other class (nonself). This paradigm is especially useful for problems in which only one class is available for training, such as network intrusion detection. Previous work has investigated hyper-rectangles and hyper-spheres as geometric detectors. This work proposes ellipsoids as geometric detectors. First, the author establishes a mathematical model for ellipsoids. He develops an algorithm to generate ellipsoids by training on only one class of data. Ellipsoid mutation operators, an objective function, and a convergence technique are described for the evolutionary algorithm that generates ellipsoid detectors. Testing on several data sets validates this approach by showing that the algorithm generates good ellipsoid detectors. Against artificial data sets, the detectors generated by the algorithm match more than 90% of nonself data with no false alarms. Against a subset of data from the 1999 DARPA MIT intrusion detection data, the ellipsoids generated by the algorithm detected approximately 98% of nonself (intrusions) with an approximate 0% false alarm rate

    Evolutionary Computation

    Get PDF
    This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field

    A brief history of learning classifier systems: from CS-1 to XCS and its variants

    Get PDF
    © 2015, Springer-Verlag Berlin Heidelberg. The direction set by Wilson’s XCS is that modern Learning Classifier Systems can be characterized by their use of rule accuracy as the utility metric for the search algorithm(s) discovering useful rules. Such searching typically takes place within the restricted space of co-active rules for efficiency. This paper gives an overview of the evolution of Learning Classifier Systems up to XCS, and then of some of the subsequent developments of Wilson’s algorithm to different types of learning

    Component Thermodynamical Selection Based Gene Expression Programming for Function Finding

    Get PDF
    Gene expression programming (GEP), improved genetic programming (GP), has become a popular tool for data mining. However, like other evolutionary algorithms, it tends to suffer from premature convergence and slow convergence rate when solving complex problems. In this paper, we propose an enhanced GEP algorithm, called CTSGEP, which is inspired by the principle of minimal free energy in thermodynamics. In CTSGEP, it employs a component thermodynamical selection (CTS) operator to quantitatively keep a balance between the selective pressure and the population diversity during the evolution process. Experiments are conducted on several benchmark datasets from the UCI machine learning repository. The results show that the performance of CTSGEP is better than the conventional GEP and some GEP variations

    Inferring the clonal identity of single cells from RNA-seq data with Unique Molecular Identifiers

    Get PDF
    Cancer is an evolutionary disease, in which heterogeneous populations of tumor cells can emerge, proliferate, and disappear depending on selective and neutral processes. This principle has been observed in many studies of acute myeloid leukemia (AML), which is the most common blood cancer in adults. Clonal heterogeneity and evolution have been proposed to play a role in the high relapse rate of this type of cancer. In order to understand this feature, it is crucial to have adequate clinical and experimental models that can provide enough data to elucidate the evolutionary history of a tumor, such as patient-derived xenografts (PDX). These models can be combined with high-resolution sequencing technologies, such as single-cell RNA-seq, to provide a detailed view of the heterogeneity and molecular features of the tumor. However, adequate analytical tools have to be applied and developed in order to fully exploit such datasets. Here I present the analysis of the clonal heterogeneity of an AML patient and the corresponding PDX model, which was treated with multiple rounds of chemotherapy. This model allowed to study the response of the tumor populations to the pressure induced by the therapy, and the possible evolutionary forces behind it. Datasets for these AML samples were generated with multiple types of sequencing methods, one of which was single-cell RNA sequencing. To enable the analysis of somatic mutations and clonal populations in this kind of data, I developed a software package, which is capable of extracting and proofreading variant sequences by making use of Unique Molecular Identifiers (UMIs), which are sequence barcodes that allow to distinguish reads that come from PCR amplification duplicates. The benefits of employing this proofreading approach for variant calling and for inferring the clonal identity of single cells were demonstrated. Finally, I applied to the analysis of the single-cell data of the AML PDX samples that were treated with chemotherapy, as well as other datasets with UMI-based sequencing
    • 

    corecore