2 research outputs found

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    An embedded feature selection framework for hybrid data

    Full text link
    © 2017, Springer International Publishing AG. Feature selection in terms of inductive supervised learning is a process of selecting a subset of features relevant to the target concept and removing irrelevant and redundant features. The majority of feature selection methods, which have been developed in the last decades, can deal with only numerical or categorical features. An exception is the Recursive Feature Elimination under the clinical kernel function which is an embedded feature selection method. However, it suffers from low classification performance. In this work, we propose several embedded feature selection methods which are capable of dealing with hybrid balanced, and hybrid imbalanced data sets. In the experimental evaluation on five UCI Machine Learning Repository data sets, we demonstrate the dominance and effectiveness of the proposed methods in terms of dimensionality reduction and classification performance
    corecore