3 research outputs found

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Cooperative co-evolution for feature selection in big data with random feature grouping

    Get PDF
    © 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity

    Decomposition Approaches for Building Design Optimization

    Get PDF
    Building performance simulation can be integrated with optimization to achieve high-performance building design objectives such as low carbon emission and cost-effectiveness by holistically considering design variables across different disciplines. However, the complexity of the design problem increases greatly with increasing dimensionality. In some cases, solving high-dimension problems is not technically feasible nor time-efficient. Decomposition is one way to reduce the complexity and dimensionality of optimization problems. However, the decomposed optimization might achieve local optimum. Therefore, deploying appropriate decomposition strategies to achieve global optimum is paramount. This study investigates the deployment of hierarchical and parallel decomposition for building design optimization problems to ensure identification of global optimum. The feasibility of combining sensitivity analysis and decomposition is also explored. At the end of this study, some recommendations are given to help select an appropriate approach in practice. First, this thesis proposes a hierarchical decomposition. Hierarchical decomposition divides an optimization problem into several interconnected subproblems solved sequentially. The proposed approach is applied to the multi-objective optimization problem that minimizes buildings' operating costs and carbon emissions. The results show that the hierarchical decomposition approach can reduce the number of simulations while achieving global optimums. Second, this thesis proposes a parallel decomposition. Parallel decomposition divides the original problem into several smaller subproblems to be solved separately, and potentially, concurrently. The proposed parallel decomposition approach is applied to solve the single-objective optimization problems of a benchmark function and a low-rise office building. The results show that the proposed approach finds the global optimum and takes less computation time than optimization without decomposition. Third, this thesis explores the feasibility of combining sensitivity analysis with decomposition for dimensionality reduction. The efficiency and accuracy of different methods are compared through three case studies. The proposed hierarchical and parallel decomposition approaches can be applied individually or combined into a hybrid decomposition approach. This thesis concludes with some recommendations to help choose a decomposition approach to solve building design optimization problems
    corecore