5,726 research outputs found

    Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection

    Get PDF
    This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.Ministry of Higher Education under the Fundamental Research Grant Scheme (FRGS/1/2018/ICT04/UTM/01/1)Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, Malaysia Research University Network (MRUN) Vot 4L876SPEV project, University of Hradec Kralove, Faculty of Informatics and Management, Czech Republic (ID: 2102–2021), “Smart Solutions in Ubiquitous Computing Environments

    TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System

    Get PDF
    Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier

    Meta-heuristic algorithms in car engine design: a literature survey

    Get PDF
    Meta-heuristic algorithms are often inspired by natural phenomena, including the evolution of species in Darwinian natural selection theory, ant behaviors in biology, flock behaviors of some birds, and annealing in metallurgy. Due to their great potential in solving difficult optimization problems, meta-heuristic algorithms have found their way into automobile engine design. There are different optimization problems arising in different areas of car engine management including calibration, control system, fault diagnosis, and modeling. In this paper we review the state-of-the-art applications of different meta-heuristic algorithms in engine management systems. The review covers a wide range of research, including the application of meta-heuristic algorithms in engine calibration, optimizing engine control systems, engine fault diagnosis, and optimizing different parts of engines and modeling. The meta-heuristic algorithms reviewed in this paper include evolutionary algorithms, evolution strategy, evolutionary programming, genetic programming, differential evolution, estimation of distribution algorithm, ant colony optimization, particle swarm optimization, memetic algorithms, and artificial immune system

    Mapping customer needs to engineering characteristics: an aerospace perspective for conceptual design

    No full text
    Designing complex engineering systems, such as an aircraft or an aero-engine, is immensely challenging. Formal Systems Engineering (SE) practices are widely used in the aerospace industry throughout the overall design process to minimise the overall design effort, corrective re-work, and ultimately overall development and manufacturing costs. Incorporating the needs and requirements from customers and other stakeholders into the conceptual and early design process is vital for the success and viability of any development programme. This paper presents a formal methodology, the Value-Driven Design (VDD) methodology that has been developed for collaborative and iterative use in the Extended Enterprise (EE) within the aerospace industry, and that has been applied using the Concept Design Analysis (CODA) method to map captured Customer Needs (CNs) into Engineering Characteristics (ECs) and to model an overall ‘design merit’ metric to be used in design assessments, sensitivity analyses, and engineering design optimisation studies. Two different case studies with increasing complexity are presented to elucidate the application areas of the CODA method in the context of the VDD methodology for the EE within the aerospace secto

    Hybrid Mammogram Classification Using Rough Set and Fuzzy Classifier

    Get PDF
    We propose a computer aided detection (CAD) system for the detection and classification of suspicious regions in mammographic images. This system combines a dimensionality reduction module (using principal component analysis), a feature extraction module (using independent component analysis), and a feature subset selection module (using rough set model). Rough set model is used to reduce the effect of data inconsistency while a fuzzy classifier is integrated into the system to label subimages into normal or abnormal regions. The experimental results show that this system has an accuracy of 84.03% and a recall percentage of 87.28%

    Observer-biased bearing condition monitoring: from fault detection to multi-fault classification

    Get PDF
    Bearings are simultaneously a fundamental component and one of the principal causes of failure in rotary machinery. The work focuses on the employment of fuzzy clustering for bearing condition monitoring, i.e., fault detection and classification. The output of a clustering algorithm is a data partition (a set of clusters) which is merely a hypothesis on the structure of the data. This hypothesis requires validation by domain experts. In general, clustering algorithms allow a limited usage of domain knowledge on the cluster formation process. In this study, a novel method allowing for interactive clustering in bearing fault diagnosis is proposed. The method resorts to shrinkage to generalize an otherwise unbiased clustering algorithm into a biased one. In this way, the method provides a natural and intuitive way to control the cluster formation process, allowing for the employment of domain knowledge to guiding it. The domain expert can select a desirable level of granularity ranging from fault detection to classification of a variable number of faults and can select a specific region of the feature space for detailed analysis. Moreover, experimental results under realistic conditions show that the adopted algorithm outperforms the corresponding unbiased algorithm (fuzzy c-means) which is being widely used in this type of problems. (C) 2016 Elsevier Ltd. All rights reserved.Grant number: 145602

    Modified and Ensemble Intelligent Water Drop Algorithms and Their Applications

    Get PDF
    1.1 Introduction Optimization is a process that concerns with finding the best solution of a given problem from among the possible solutions within an affordable time and cost (Weise et al., 2009). The first step in the optimization process is formulating the optimization problem through an objective function and a set of constrains that encompass the problem search space (ie, regions of feasible solutions). Every alternative (ie, solution) is represented by a set of decision variables. Each decision variable has a domain, which is a representation of the set of all possible values that the decision variable can take. The second step in optimization starts by utilizing an optimization method (ie, search method) to find the best candidate solutions. Candidate solution has a configuration of decision variables that satisfies the set of problem constrains, and that maximizes or minimizes the objective function (Boussaid et al., 2013). It converges to the optimal solution (ie, local or global optimal solution) by reaching the optimal values of the decision variables. Figure 1.1 depicts a 3D-fitness landscape of an optimization problem. It shows the concept of the local and global optima, where the local optimal solution is not necessarily the same as the global one (Weise et al., 2009). Optimization can be applied to many real-world problems in various domains. As an example, mathematicians apply optimization methods to identify the best outcome pertaining to some mathematical functions within a range of variables (Vesterstrom and Thomsen, 2004). In the presence of conflicting criteria, engineers use optimization methods t

    From fuzzy-rough to crisp feature selection

    Get PDF
    A central problem in machine learning and pattern recognition is the process of recognizing the most important features in a dataset. This process plays a decisive role in big data processing by reducing the size of datasets. One major drawback of existing feature selection methods is the high chance of redundant features appearing in the final subset, where in most cases, finding and removing them can greatly improve the resulting classification accuracy. To tackle this problem on two different fronts, we employed fuzzy-rough sets and perturbation theories. On one side, we used three strategies to improve the performance of fuzzy-rough set-based feature selection methods. The first strategy was to code both features and samples in one binary vector and use a shuffled frog leaping algorithm to choose the best combination using fuzzy dependency degree as the fitness function. In the second strategy, we designed a measure to evaluate features based on fuzzy-rough dependency degree in a fashion where redundant features are given less priority to be selected. In the last strategy, we designed a new binary version of the shuffled frog leaping algorithm that employs a fuzzy positive region as its similarity measure to work in complete harmony with the fitness function (i.e. fuzzy-rough dependency degree). To extend the applicability of fuzzy-rough set-based feature selection to multi-party medical datasets, we designed a privacy-preserving version of the original method. In addition, we studied the feasibility and applicability of perturbation theory to feature selection, which to the best of our knowledge has never been researched. We introduced a new feature selection based on perturbation theory that is not only capable of detecting and discarding redundant features but also is very fast and flexible in accommodating the special needs of the application. It employs a clustering algorithm to group likely-behaved features based on the sensitivity of each feature to perturbation, the angle of each feature to the outcome and the effect of removing each feature to the outcome, and it chooses the closest feature to the centre of each cluster and returns all those features as the final subset. To assess the effectiveness of the proposed methods, we compared the results of each method with well-known feature selection methods against a series of artificially generated datasets, and biological, medical and cancer datasets adopted from the University of California Irvine machine learning repository, Arizona State University repository and Gene Expression Omnibus repository

    Review and prioritization of investment projects in the Waste Management organization of Tabriz Municipality with a Rough Sets Theory approach

    Get PDF
    Purpose: Prioritization of investment projects is a key step in the process of planning the investment activities of organizations. Choosing the suitable projects has a direct impact on the profitability and other strategic goals of organizations. Factors affecting the prioritization of investment projects are complex and the use of traditional methods alone cannot be useful, so there is a need to use a suitable model for prioritizing projects and investment plans. The purpose of this study is to prioritize projects and investment methods for projects (10 projects) considered by the Waste Management Organization of Tabriz Municipality. Methodology: The method of analysis used is the theory of rough, so that first the important investment projects in the field of waste management were determined using the research background and opinion of experts and the weight and priority of the projects were obtained using the Rough Sets Theory. Then, the priority of appropriate investment methods (out of 6 methods) of each project was obtained using Rough numbers, the opinion of experts and other aspects. Findings: The result of the research has been that construction project of a specialized recycling town, plastic recycling project, and recycled tire recycling project are three priority projects of Tabriz Municipality Waste Management Organization, respectively. Three investment methods, civil partnership agreements, BOT, and BOO can be used for them. Originality/Value: Tabriz Municipality Waste Management is an important and influential organization in the activities of the city, in which the investment methods in its projects are mostly based on common contracts and are performed in the same way for all projects. This research offers new methods for projects and their diversity according to Rough Sets technique
    corecore