9 research outputs found

    Parameter-Free Extreme Learning Machine for Imbalanced Classification

    Get PDF
    CAUL read and publish agreement 2022Publishe

    Fault Detection and Diagnosis with Imbalanced and Noisy Data: A Hybrid Framework for Rotating Machinery

    Full text link
    Fault diagnosis plays an essential role in reducing the maintenance costs of rotating machinery manufacturing systems. In many real applications of fault detection and diagnosis, data tend to be imbalanced, meaning that the number of samples for some fault classes is much less than the normal data samples. At the same time, in an industrial condition, accelerometers encounter high levels of disruptive signals and the collected samples turn out to be heavily noisy. As a consequence, many traditional Fault Detection and Diagnosis (FDD) frameworks get poor classification performances when dealing with real-world circumstances. Three main solutions have been proposed in the literature to cope with this problem: (1) the implementation of generative algorithms to increase the amount of under-represented input samples, (2) the employment of a classifier being powerful to learn from imbalanced and noisy data, (3) the development of an efficient data pre-processing including feature extraction and data augmentation. This paper proposes a hybrid framework which uses the three aforementioned components to achieve an effective signal-based FDD system for imbalanced conditions. Specifically, it first extracts the fault features, using Fourier and wavelet transforms to make full use of the signals. Then, it employs Wasserstein Generative Adversarial Networks (WGAN) to generate synthetic samples to populate the rare fault class and enhance the training set. Moreover, to achieve a higher performance a novel combination of Convolutional Long Short-term Memory (CLSTM) and Weighted Extreme Learning Machine (WELM) is proposed. To verify the effectiveness of the developed framework, different datasets settings on different imbalance severities and noise degrees were used. The comparative results demonstrate that in different scenarios GAN-CLSTM-ELM outperforms the other state-of-the-art FDD frameworks.Comment: 23 pages, 11 figure

    Classification technique for minority class on imbalanced dataset with data partitioning method

    Get PDF

    Novel Strategies to Accelerate Search Algorithms in Data Reduction

    Get PDF
    In our current hyper-connected digital world where data is growing enormously, instance reduction is an essential pre-processing phase to obtain cleaner and smaller datasets that are free from noise, redundant or irrelevant samples (the so-called, Smart Data). The data after pre-processing may become more reliable, accurate and useful for subsequent data mining tasks. Instance reduction consists of two types: instance selection and instance generation; each can be formulated as a combinatorial/continuous optimisation problem depending on whether its decision variable is discrete or continuous, respectively. It is an emerging challenge characterised by multimodality and a large number of decision variables. Given such difficulties, derivative-free methods are likely promising approaches to address the problem. They are powerful search algorithms that seek the nearest local optimum and do not necessarily take into account the gradient computation of the objective function like derivative methods. Solutions for instance reduction fall into the intersection of machine learning, data mining and optimisation at which the process of a domain can take part in the execution of another. Thus, the synergy between domains is important to solve the problem more effectively, and this has attracted a significant interest from researchers. Among many different derivative-free search approaches, the family of direct search methods has introduced various strategies to tackle numerous modern numerical optimisation problems, where population-based meta-heuristics and pattern search can be considered two of the most prevalent in the literature. Population-based meta-heuristics are an iterative search framework composing several subordinate low-level heuristics to control exploration and exploitation for a pool of solution candidates. This set of methods searches for high-quality solutions from multi-points, and thus is usually associated with high computational expense. Pattern search methods seek an improved solution from candidates that are generated from different directions. They examine trial solutions sequentially by comparing each trial solution with the `best' solution found up to the present time. In this dissertation, we will investigate these derivative-free search strategies to address instance reduction, a critical optimisation problem in the field of data science. Although many derivative-free methods have been proved effective in addressing instance reduction, they are usually time-consuming, especially when handling relatively large datasets. This impediment limits their practicality in many data mining systems and thus necessitates a solution to accelerate the search process. The need for a fast and effective search framework for instance reduction has motivated us to develop novel search strategies in the family of direct search approaches, aiming to still obtain high quality solutions achieved by state-of-the-art techniques in the domain, but significantly reduce the runtime of the search process. Three major work packages presented in this thesis will cover two direct search approaches for two types of instance reduction, arranged in a progressive order at which findings at an earlier stage will contribute to the understanding of the later outcomes. Firstly, a novel evolutionary search framework for instance selection is proposed to balance the number of samples between classes to address a case study of imbalanced classification. Secondly, we develop another search framework for instance generation based on single-point search and memetic computing, namely Single-Point Memetic Structure. An accelerated mechanism for computing the objective function is embedded into the proposed search design, thus reducing significantly the runtime. Finally, a novel search framework for simultaneous instance selection and generation is designed to handle the instance reduction problem in both combinatorial and continuous search spaces. In summary, the research conducted here introduces a set of novel search strategies towards derivative-free methods to tackle instance reduction problems. They are different search frameworks which aim to produce a high quality reduced set from a relatively large original source within a reasonable amount of time. This is accomplished by either taking advantage of machine learning integration or the Single-Point Memetic Structure with an accelerated mechanism. The use of machine learning in a meta-heuristic search framework greatly speeds up the computation of the objective function while the Single-Point Memetic Search allows us to reuse virtually all prior calculations for computing the fitness value of newly evolved individuals. Hence, these novel search strategies can save vast computational cost. Finally, we leverage the insights previously found to propose another novel search framework that handles both instance selection and instance generation simultaneously, and operates in both combinatorial and continuous search spaces. These novel search strategies are examined with a large number of datasets in different hyper-parameter settings. The obtained numerical results are comprehensively analysed and verified by different statistical tests to prove the robustness of the proposed search strategies with respect to other state-of-the-art techniques in the domain

    Novel Strategies to Accelerate Search Algorithms in Data Reduction

    Get PDF
    In our current hyper-connected digital world where data is growing enormously, instance reduction is an essential pre-processing phase to obtain cleaner and smaller datasets that are free from noise, redundant or irrelevant samples (the so-called, Smart Data). The data after pre-processing may become more reliable, accurate and useful for subsequent data mining tasks. Instance reduction consists of two types: instance selection and instance generation; each can be formulated as a combinatorial/continuous optimisation problem depending on whether its decision variable is discrete or continuous, respectively. It is an emerging challenge characterised by multimodality and a large number of decision variables. Given such difficulties, derivative-free methods are likely promising approaches to address the problem. They are powerful search algorithms that seek the nearest local optimum and do not necessarily take into account the gradient computation of the objective function like derivative methods. Solutions for instance reduction fall into the intersection of machine learning, data mining and optimisation at which the process of a domain can take part in the execution of another. Thus, the synergy between domains is important to solve the problem more effectively, and this has attracted a significant interest from researchers. Among many different derivative-free search approaches, the family of direct search methods has introduced various strategies to tackle numerous modern numerical optimisation problems, where population-based meta-heuristics and pattern search can be considered two of the most prevalent in the literature. Population-based meta-heuristics are an iterative search framework composing several subordinate low-level heuristics to control exploration and exploitation for a pool of solution candidates. This set of methods searches for high-quality solutions from multi-points, and thus is usually associated with high computational expense. Pattern search methods seek an improved solution from candidates that are generated from different directions. They examine trial solutions sequentially by comparing each trial solution with the `best' solution found up to the present time. In this dissertation, we will investigate these derivative-free search strategies to address instance reduction, a critical optimisation problem in the field of data science. Although many derivative-free methods have been proved effective in addressing instance reduction, they are usually time-consuming, especially when handling relatively large datasets. This impediment limits their practicality in many data mining systems and thus necessitates a solution to accelerate the search process. The need for a fast and effective search framework for instance reduction has motivated us to develop novel search strategies in the family of direct search approaches, aiming to still obtain high quality solutions achieved by state-of-the-art techniques in the domain, but significantly reduce the runtime of the search process. Three major work packages presented in this thesis will cover two direct search approaches for two types of instance reduction, arranged in a progressive order at which findings at an earlier stage will contribute to the understanding of the later outcomes. Firstly, a novel evolutionary search framework for instance selection is proposed to balance the number of samples between classes to address a case study of imbalanced classification. Secondly, we develop another search framework for instance generation based on single-point search and memetic computing, namely Single-Point Memetic Structure. An accelerated mechanism for computing the objective function is embedded into the proposed search design, thus reducing significantly the runtime. Finally, a novel search framework for simultaneous instance selection and generation is designed to handle the instance reduction problem in both combinatorial and continuous search spaces. In summary, the research conducted here introduces a set of novel search strategies towards derivative-free methods to tackle instance reduction problems. They are different search frameworks which aim to produce a high quality reduced set from a relatively large original source within a reasonable amount of time. This is accomplished by either taking advantage of machine learning integration or the Single-Point Memetic Structure with an accelerated mechanism. The use of machine learning in a meta-heuristic search framework greatly speeds up the computation of the objective function while the Single-Point Memetic Search allows us to reuse virtually all prior calculations for computing the fitness value of newly evolved individuals. Hence, these novel search strategies can save vast computational cost. Finally, we leverage the insights previously found to propose another novel search framework that handles both instance selection and instance generation simultaneously, and operates in both combinatorial and continuous search spaces. These novel search strategies are examined with a large number of datasets in different hyper-parameter settings. The obtained numerical results are comprehensively analysed and verified by different statistical tests to prove the robustness of the proposed search strategies with respect to other state-of-the-art techniques in the domain
    corecore