11,226 research outputs found

    Cooperative co-evolution for feature selection in big data with random feature grouping

    Get PDF
    © 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity

    Decomposition for Large-scale Optimization Problems with Overlapping Components

    Get PDF
    In this paper we use a divide-and-conquer approach to tackle large-scale optimization problems with overlapping components. Decomposition for an overlapping problem is challenging as its components depend on one another. The existing decomposition methods typically assign all the linked decision variables into one group, thus cannot reduce the original problem size. To address this issue we modify the Recursive Differential Grouping (RDG) method to decompose overlapping problems, by breaking the linkage at variables shared by multiple components. To evaluate the efficacy of our method, we extend two existing overlapping benchmark problems considering various level of overlap. Experimental results show that our method can greatly improve the search ability of an optimization algorithm via divide-and-conquer, and outperforms RDG, random decomposition as well as other state-of-the-art methods. We further evaluate our method using the CEC'2013 benchmark problems and show that our method is very competitive when equipped with a component optimizer

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    A review of population-based metaheuristics for large-scale black-box global optimization: Part A

    Get PDF
    Scalability of optimization algorithms is a major challenge in coping with the ever growing size of optimization problems in a wide range of application areas from high-dimensional machine learning to complex large-scale engineering problems. The field of large-scale global optimization is concerned with improving the scalability of global optimization algorithms, particularly population-based metaheuristics. Such metaheuristics have been successfully applied to continuous, discrete, or combinatorial problems ranging from several thousand dimensions to billions of decision variables. In this two-part survey, we review recent studies in the field of large-scale black-box global optimization to help researchers and practitioners gain a bird’s-eye view of the field, learn about its major trends, and the state-of-the-art algorithms. Part of the series covers two major algorithmic approaches to large-scale global optimization: problem decomposition and memetic algorithms. Part of the series covers a range of other algorithmic approaches to large-scale global optimization, describes a wide range of problem areas, and finally touches upon the pitfalls and challenges of current research and identifies several potential areas for future research

    Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance

    Full text link
    Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016

    DG2: A Faster and More Accurate Differential Grouping for Large-Scale Black-Box Optimization

    Get PDF
    Identification of variable interaction is essential for an efficient implementation of a divide-and-conquer algorithm for large-scale black-box optimization. In this paper, we propose an improved variant of the differential grouping (DG) algorithm, which has a better efficiency and grouping accuracy. The proposed algorithm, DG2, finds a reliable threshold value by estimating the magnitude of roundoff errors. With respect to efficiency, DG2 reuses the sample points that are generated for detecting interactions and saves up to half of the computational resources on fully separable functions. We mathematically show that the new sampling technique achieves the lower bound with respect to the number of function evaluations. Unlike its predecessor, DG2 checks all possible pairs of variables for interactions and has the capacity to identify overlapping components of an objective function. On the accuracy aspect, DG2 outperforms the state-of-the-art decomposition methods on the latest large-scale continuous optimization benchmark suites. DG2 also performs reliably in the presence of imbalance among contribution of components in an objective function. Another major advantage of DG2 is the automatic calculation of its threshold parameter (ϵ\epsilon ), which makes it parameter-free. Finally, the experimental results show that when DG2 is used within a cooperative co-evolutionary framework, it can generate competitive results as compared to several state-of-the-art algorithms
    • …
    corecore