59 research outputs found

    On the improvement of complexity time and detection rate of outlier detectors : an unsupervised ensemble perspective

    Get PDF
    This thesis presents two unsupervised algorithms to detect outlier observations whose aberrant behavior is hidden in lower dimensional subspaces or cannot be identified with the use of a single detector. In particular, we contemplated three facets: first, the difficulty of a single detector to identify different types of outliers; second, the propensity of interesting outliers to hide in low dimensional subspaces; third, the impact that distinct distance measures have on the outlier detection process. The ambition of the proposed algorithms is to improve our understanding about data observations whose outlier behavior is not evident using simple outlier detection algorithms. Accordingly, we addressed three specific problems. First, we propose to design an ensemble based on different types of outlier detectors with a set of weights assigned without supervision. Second, we propose an ensemble to identify observations whose outlier behavior is visible only on specific subspaces. Third, we develop a scheme to understand how a single detector or an ensemble of outlier detectors is influenced by the selection of a distance metric and its interaction with different dimensionalities, data sizes, parameter settings or ensemble components. There is a wide availability of algorithms aimed at detecting outliers. However, the number of unsupervised ensemble approaches is limited and are mainly oriented towards the detection of a specific type of outlier. Accordingly, our first goal is to detect, in a unsupervised manner, distinct type of outlying observations. We propose an approach capable of using the output of different types of detectors, assigning specific weights to each detector depending on an internal evaluation (unsupervised) of the ability that each algorithm has on the specific dataset at hand; furthermore, this approach assigns a second weight to each data observation in order to increase the gap between outlier and inliers, further improving the outlier detection rate. The main contribution of this work is an ensemble of outlier detectors, whose components can be based on different assumptions, with an enhanced outlier detection rate when compared with similar single and ensemble approaches for outlier detection. Nonetheless, our approach exhibits a processing time linearly dependent on the number of ensemble components; this behavior is not exclusive of our approach, being instead prevalent in the ensemble outlier detection literature. The second part of this thesis focuses on the detection of a complex type of outliers, known in the literature as interesting outliers, which are detectable only on specific subspaces of the data, on the contrary simple outliers are detectable on full dimensionality. Since our first approach was unable to efficiently detect this type of outlier, our second goal is the detection of lower dimensional outliers in a computationally efficient time. We propose an unsupervised ensemble based on different subspaces and subsamples of data which provides a higher detection rate and is computationally more efficient than similar ensemble approaches; in some cases, our approach is even better to that of a single execution of a simple outlier detection algorithm. The main contributions of this work are the possibility of detecting lower dimensional outliers within an improved processing time. The last section of this thesis is oriented towards the study of the interaction between distance metric, parameter settings, data size, dimensionality and number of ensemble components in determining the detection rate and processing time of an outlier detector. Hence, our third goal is to improve our comprehension about the multiple factors influencing an outlier detection algorithm. A set of experiments has been devised to evaluate both detection rate and processing time. The experiments cover a wide set of synthetic and real-world data scenarios. Our synthetic data experiments allow us to introduce perturbations in the size and dimensionality of the data, while real world data permits an evaluation of the effect of varying the parameter settings of an algorithm. To the best of our knowledge this is the first evaluation considering a complete set of factors, mainly distance metrics, influencing the effectiveness and efficiency of an outlier detector. The understanding achieved in this study can be a key step towards the development of new ensemble approaches or the selection and parameterization of existing ones

    Fault Detection and Diagnosis of Electric Drives Using Intelligent Machine Learning Approaches

    Get PDF
    Electric motor condition monitoring can detect anomalies in the motor performance which have the potential to result in unexpected failure and financial loss. This study examines different fault detection and diagnosis approaches in induction motors and is presented in six chapters. First, an anomaly technique or outlier detection is applied to increase the accuracy of detecting broken rotor bars. It is shown how the proposed method can significantly improve network reliability by using one-class classification technique. Then, ensemble-based anomaly detection is utilized to compare different methods in ensemble learning in detection of broken rotor bars. Finally, a deep neural network is developed to extract significant features to be used as input parameters of the network. Deep autoencoder is then employed to build an advanced model to make predictions of broken rotor bars and bearing faults occurring in induction motors with a high accuracy

    Unsupervised Ensembles Techniques for Visualization

    Get PDF
    In this paper we introduce two unsupervised techniques for visualization purposes based on the use of ensemble methods. The unsupervised techniques which are often quite sensitive to the presence of outliers are combined with the ensemble approaches in order to overcome the influence of outliers. The first technique is based on the use of Principal Component Analysis and the second one is known for its topology preserving characteristics and is based on the combination of the Scale Invariant Map and Maximum Likelihood Hebbian learning. In order to show the advantage of these novel ensemble-based techniques the results of some experiments carried out on artificial and real data sets are included

    Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

    Get PDF
    [Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality. Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams. In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high- dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc

    ENSEMBLE LEARNING FOR ANOMALY DETECTION WITH APPLICATIONS FOR CYBERSECURITY AND TELECOMMUNICATION

    Get PDF
    • ā€¦
    corecore