36 research outputs found

    Online fault detection based on typicality and eccentricity data analytics

    Get PDF
    Fault detection is a task of major importance in industry nowadays, since that it can considerably reduce the risk of accidents involving human lives, in addition to production and, consequently, financial losses. Therefore, fault detection systems have been largely studied in the past few years, resulting in many different methods and approaches to solve such problem. This paper presents a detailed study on fault detection on industrial processes based on the recently introduced eccentricity and typicality data analytics (TEDA) approach. TEDA is a recursive and non-parametric method, firstly proposed to the general problem of anomaly detection on data streams. It is based on the measures of data density and proximity from each read data point to the analyzed data set. TEDA is an online autonomous learning algorithm that does not require a priori knowledge about the process, is completely free of user- and problem-defined parameters, requires very low computational effort and, thus, is very suitable for real-time applications. The results further presented were generated by the application of TEDA to a pilot plant for industrial process

    An evolving approach to unsupervised and Real-Time fault detection in industrial processes

    Get PDF
    Fault detection in industrial processes is a field of application that has gaining considerable attention in the past few years, resulting in a large variety of techniques and methodologies designed to solve that problem. However, many of the approaches presented in literature require relevant amounts of prior knowledge about the process, such as mathematical models, data distribution and pre-defined parameters. In this paper, we propose the application of TEDA - Typicality and Eccentricity Data Analytics - , a fully autonomous algorithm, to the problem of fault detection in industrial processes. In order to perform fault detection, TEDA analyzes the density of each read data sample, which is calculated based on the distance between that sample and all the others read so far. TEDA is an online algorithm that learns autonomously and does not require any previous knowledge about the process nor any user-defined param-eters. Moreover, it requires minimum computational effort, enabling its use for real-time applications. The efficiency of the proposed approach is demonstrated with two different real world industrial plant data streams that provide “normal” and “faulty” data. The results shown in this paper are very encouraging when compared with traditional fault detection approaches

    Typicality distribution function:a new density-based data analytics tool

    Get PDF
    In this paper a new density-based, non-frequentistic data analytics tool, called typicality distribution function (TDF) is proposed. It is a further development of the recently introduced typicality- and eccentricity-based data analytics (TEDA) framework. The newly introduced TDF and its standardized form offer an effective alternative to the widely used probability distribution function (pdf), however, remaining free from the restrictive assumptions made and required by the latter. In particular, it offers an exact solution for any (except a single point) amount of non-coinciding data samples. For a comparison, that the well developed and widely used traditional probability theory and related statistical learning approaches require (theoretically) an infinitely large amount of data samples/ observations, although, in practice this requirement is often ignored. Furthermore, TDF does not require the user to pre-select or assume a particular distribution (e.g. Gaussian or other) or a mixture of such distributions or to pre-define the number of such distributions in a mixture. In addition, it does not require the individual data items to be independent. At the same time, the link with the traditional statistical approaches such as the well-known “nσ” analysis, Chebyshev inequality, etc. offers the interesting conclusion that without the restrictive prior assumptions listed above to which these traditional approaches are tied up the same type of analysis can be made using TDF automatically. TDF can provide valuable information for analysis of extreme processes, fault detection and identification were the amount of observations of extreme events or faults is usually disproportionally small. The newly proposed TDF offers a non-parametric, closed form analytical (quadratic) description extracted from the real data realizations exactly in contrast to the usual practice where such distributions are being pre-assumed or approximated. For example, so call- d particle filters are also a non-parametric approximation of the traditional statistics; however, they suffer from computational complexity and introduce a large number of dummy data. In addition to that, for several types of proximity/similarity measures (such as Euclidean, Mahalonobis, cosine) it can be calculated recursively, thus, computationally very efficiently and is suitable for real time and online algorithms. Moreover, with a very simple example, it has been illustrated that while traditional probability theory and related statistical approaches can lead in some cases to paradoxically incorrect results and/or to the need for hard prior assumptions to be made. In contrast, the newly proposed TDF can offer a logically meaningful result and an intuitive interpretation automatically and exactly without any prior assumptions. Finally, few simple univariate examples are provided and the process of inference is discussed and the future steps of the development of TDF and TEDA are outlined. Since it is a new fundamental theoretical innovation the areas of applications of TDF and TEDA can span from anomaly detection, clustering, classification, prediction, control, regression to (Kalman-like) filters. Practical applications can be even wider and, therefore, it is difficult to list all of them

    Unsupervised classification of data streams based on typicality and eccentricity data analytics

    Get PDF
    In this paper, we propose a novel approach to unsupervised and online data classification. The algorithm is based on the statistical analysis of selected features and development of a self-evolving fuzzy-rule-basis. It starts learning from an empty rule basis and, instead of offline training, it learns “on-the-fly”. It is free of parameters and, thus, fuzzy rules, number, size or radius of the classes do not need to be pre-defined. It is very suitable for the classification of online data streams with realtime constraints. The past data do not need to be stored in memory, since that the algorithm is recursive, which makes it memory and computational power efficient. It is able to handle concept-drift and concept-evolution due to its evolving nature, which means that, not only rules/classes can be updated, but new classes can be created as new concepts emerge from the data. It can perform fuzzy classification/soft-labeling, which is preferred over traditional crisp classification in many areas of application. The algorithm was validated with an industrial pilot plant, where online calculated period and amplitude of control signal were used as input to a fault diagnosis application. The approach, however, is generic and can be applied to different problems and with much higher dimensional inputs. The results obtained from the real data are very significant

    Automatic detection of computer network traffic anomalies based on eccentricity analysis

    Get PDF
    In this paper, we propose an approach to automatic detection of attacks on computer networks using data that combine the traffic generated with'live' intra-cloud virtual-machine (VM) migration. The method used in this work is the recently introduced typicality and eccentricity data analytics (TEDA) framework. We compare the results of applying TEDA with the traditionally used methods such as statistical analysis, such as k-means clustering. One of the biggest challenges in computer network analysis using statistical or numerical methods is the fact that the protocols information is composed of integer/string values and, thus, not easy to handle by traditional pattern recognition methods that deal with real values. In this study we consider as features the tuple {IP source, IP destination, Port source and Port destination} extracted from the network flow data in addition to the traditionally used real values that represent the number of packets per time or quantity of bytes per time. Using entropy of the IP data helps to convert the integer raw data into real valued signatures. The proposed solution permit to build a real-time anomaly detection system and reduce the number of information that is necessary for evaluation. In general, the systems based on traffic are fast and are used in real time but they do not produce good results in attacks that produce a flow hidden within the background traffic or within a high traffic that is produced by other application. We validate our approach an a dataset which includes attacks on the network port scan (NPS) and network scan (NS) that permit hidden flow within the normal traffic and see this attacks together with live migration which produces a higher traffic flow

    An evolving approach to data streams clustering based on typicality and eccentricity data analytics

    Get PDF
    In this paper we propose an algorithm for online clustering of data streams. This algorithm is called AutoCloud and is based on the recently introduced concept of Typicality and Eccentricity Data Analytics, mainly used for anomaly detection tasks. AutoCloud is an evolving, online and recursive technique that does not need training or prior knowledge about the data set. Thus, AutoCloud is fully online, requiring no offline processing. It allows creation and merging of clusters autonomously as new data observations become available. The clusters created by AutoCloud are called data clouds, which are structures without pre-defined shape or boundaries. AutoCloud allows each data sample to belong to multiple data clouds simultaneously using fuzzy concepts. AutoCloud is also able to handle concept drift and concept evolution, which are problems that are inherent in data streams in general. Since the algorithm is recursive and online, it is suitable for applications that require a real-time response. We validate our proposal with applications to multiple well known data sets in the literature

    Predictive maintenance: a novel framework for a data-driven, semi-supervised, and partially online prognostic health management application in industries

    Get PDF
    Prognostic Health Management (PHM) is a predictive maintenance strategy, which is based on Condition Monitoring (CM) data and aims to predict the future states of machinery. The existing literature reports the PHM at two levels: methodological and applicative. From the methodological point of view, there are many publications and standards of a PHM system design. From the applicative point of view, many papers address the improvement of techniques adopted for realizing PHM tasks without covering the whole process. In these cases, most applications rely on a large amount of historical data to train models for diagnostic and prognostic purposes. Industries, very often, are not able to obtain these data. Thus, the most adopted approaches, based on batch and off-line analysis, cannot be adopted. In this paper, we present a novel framework and architecture that support the initial application of PHM from the machinery producers’ perspective. The proposed framework is based on an edge-cloud infrastructure that allows performing streaming analysis at the edge to reduce the quantity of the data to store in permanent memory, to know the health status of the machinery at any point in time, and to discover novel and anomalous behaviors. The collection of the data from multiple machines into a cloud server allows training more accurate diagnostic and prognostic models using a higher amount of data, whose results will serve to predict the health status in real-time at the edge. The so-built PHM system would allow industries to monitor and supervise a machinery network placed in different locations and can thus bring several benefits to both machinery producers and users. After a brief literature review of signal processing, feature extraction, diagnostics, and prognostics, including incremental and semi-supervised approaches for anomaly and novelty detection applied to data streams, a case study is presented. It was conducted on data collected from a test rig and shows the potential of the proposed framework in terms of the ability to detect changes in the operating conditions and abrupt faults and storage memory saving. The outcomes of our work, as well as its major novel aspect, is the design of a framework for a PHM system based on specific requirements that directly originate from the industrial field, together with indications on which techniques can be adopted to achieve such goals

    Detecting anomalous behaviour using heterogeneous data

    Get PDF
    In this paper, we propose a method to detect anomalous behaviour using heterogenous data. This method detects anomalies based on the recently introduced approach known as Recursive Density Estimation (RDE) and the so called eccentricity. This method does not require prior assumptions to be made on the type of the data distribution. A simplified form of the well-known Chebyshev condition (inequality) is used for the standardised eccentricity and it applies to any type of distribution. This method is applied to three datasets which include credit card, loyalty card and GPS data. Experimental results show that the proposed method may simplify the complex real cases of forensic investigation which require processing huge amount of heterogeneous data to find anomalies. The proposed method can simplify the tedious job of processing the data and assist the human expert in making important decisions. In our future research, more data will be applied such as natural language (e.g. email, Twitter, SMS) and images
    corecore