22 research outputs found

    Unsupervised classification of data streams based on typicality and eccentricity data analytics

    Get PDF
    In this paper, we propose a novel approach to unsupervised and online data classification. The algorithm is based on the statistical analysis of selected features and development of a self-evolving fuzzy-rule-basis. It starts learning from an empty rule basis and, instead of offline training, it learns “on-the-fly”. It is free of parameters and, thus, fuzzy rules, number, size or radius of the classes do not need to be pre-defined. It is very suitable for the classification of online data streams with realtime constraints. The past data do not need to be stored in memory, since that the algorithm is recursive, which makes it memory and computational power efficient. It is able to handle concept-drift and concept-evolution due to its evolving nature, which means that, not only rules/classes can be updated, but new classes can be created as new concepts emerge from the data. It can perform fuzzy classification/soft-labeling, which is preferred over traditional crisp classification in many areas of application. The algorithm was validated with an industrial pilot plant, where online calculated period and amplitude of control signal were used as input to a fault diagnosis application. The approach, however, is generic and can be applied to different problems and with much higher dimensional inputs. The results obtained from the real data are very significant

    Local modes-based free-shape data partitioning

    Get PDF
    In this paper, a new data partitioning algorithm, named “local modes-based data partitioning”, is proposed. This algorithm is entirely data-driven and free from any user input and prior assumptions. It automatically derives the modes of the empirically observed density of the data samples and results in forming parameter-free data clouds. The identified focal points resemble Voronoi tessellations. The proposed algorithm has two versions, namely, offline and evolving. The two versions are both able to work separately and start “from scratch”, they can also perform a hybrid. Numerical experiments demonstrate the validity of the proposed algorithm as a fully autonomous partitioning technique, and achieve better performance compared with alternative algorithms

    A comparative study of autonomous learning outlier detection methods applied to fault detection

    Get PDF
    Outlier detection is a problem that has been largely studied in the past few years due to its great applicability in real world problems (e.g. financial, social, climate, security). Fault detection in industrial processes is one of these problems. In that context, several methods have been proposed in literature to address fault detection. In this paper we propose a comparative analysis of three recently introduced outlier detection methods: RDE, RDE with Forgetting and TEDA. Such methods were applied to the data set provided in DAMADICS benchmark, a very well-known real data tool for fault detection applications. The results, however, can be extended to similar problems of the area. Therewith, in this work we compare the main features of each method as well as the results obtained with them

    Online fault detection based on typicality and eccentricity data analytics

    Get PDF
    Fault detection is a task of major importance in industry nowadays, since that it can considerably reduce the risk of accidents involving human lives, in addition to production and, consequently, financial losses. Therefore, fault detection systems have been largely studied in the past few years, resulting in many different methods and approaches to solve such problem. This paper presents a detailed study on fault detection on industrial processes based on the recently introduced eccentricity and typicality data analytics (TEDA) approach. TEDA is a recursive and non-parametric method, firstly proposed to the general problem of anomaly detection on data streams. It is based on the measures of data density and proximity from each read data point to the analyzed data set. TEDA is an online autonomous learning algorithm that does not require a priori knowledge about the process, is completely free of user- and problem-defined parameters, requires very low computational effort and, thus, is very suitable for real-time applications. The results further presented were generated by the application of TEDA to a pilot plant for industrial process

    Typicality distribution function:a new density-based data analytics tool

    Get PDF
    In this paper a new density-based, non-frequentistic data analytics tool, called typicality distribution function (TDF) is proposed. It is a further development of the recently introduced typicality- and eccentricity-based data analytics (TEDA) framework. The newly introduced TDF and its standardized form offer an effective alternative to the widely used probability distribution function (pdf), however, remaining free from the restrictive assumptions made and required by the latter. In particular, it offers an exact solution for any (except a single point) amount of non-coinciding data samples. For a comparison, that the well developed and widely used traditional probability theory and related statistical learning approaches require (theoretically) an infinitely large amount of data samples/ observations, although, in practice this requirement is often ignored. Furthermore, TDF does not require the user to pre-select or assume a particular distribution (e.g. Gaussian or other) or a mixture of such distributions or to pre-define the number of such distributions in a mixture. In addition, it does not require the individual data items to be independent. At the same time, the link with the traditional statistical approaches such as the well-known “nσ” analysis, Chebyshev inequality, etc. offers the interesting conclusion that without the restrictive prior assumptions listed above to which these traditional approaches are tied up the same type of analysis can be made using TDF automatically. TDF can provide valuable information for analysis of extreme processes, fault detection and identification were the amount of observations of extreme events or faults is usually disproportionally small. The newly proposed TDF offers a non-parametric, closed form analytical (quadratic) description extracted from the real data realizations exactly in contrast to the usual practice where such distributions are being pre-assumed or approximated. For example, so call- d particle filters are also a non-parametric approximation of the traditional statistics; however, they suffer from computational complexity and introduce a large number of dummy data. In addition to that, for several types of proximity/similarity measures (such as Euclidean, Mahalonobis, cosine) it can be calculated recursively, thus, computationally very efficiently and is suitable for real time and online algorithms. Moreover, with a very simple example, it has been illustrated that while traditional probability theory and related statistical approaches can lead in some cases to paradoxically incorrect results and/or to the need for hard prior assumptions to be made. In contrast, the newly proposed TDF can offer a logically meaningful result and an intuitive interpretation automatically and exactly without any prior assumptions. Finally, few simple univariate examples are provided and the process of inference is discussed and the future steps of the development of TDF and TEDA are outlined. Since it is a new fundamental theoretical innovation the areas of applications of TDF and TEDA can span from anomaly detection, clustering, classification, prediction, control, regression to (Kalman-like) filters. Practical applications can be even wider and, therefore, it is difficult to list all of them

    Cybernetics of the mind:learning individual's perceptions autonomously

    Get PDF
    In this article, we describe an approach to computational modeling and autonomous learning of the perception of sensory inputs by individuals. A hierarchical process of summarization of heterogeneous raw data is proposed. At the lower level of the hierarchy, the raw data autonomously form semantically meaningful concepts. Instead of clustering based on visual or audio similarity, the concepts are formed at the second level of the hierarchy based on observed physiological variables (PVs) such as heart rate and skin conductance and are mapped to the emotional state of the individual. Wearable sensors were used in the experiments

    Cybernetics of the mind:learning individual's perceptions autonomously

    Get PDF
    In this article, we describe an approach to computational modeling and autonomous learning of the perception of sensory inputs by individuals. A hierarchical process of summarization of heterogeneous raw data is proposed. At the lower level of the hierarchy, the raw data autonomously form semantically meaningful concepts. Instead of clustering based on visual or audio similarity, the concepts are formed at the second level of the hierarchy based on observed physiological variables (PVs) such as heart rate and skin conductance and are mapped to the emotional state of the individual. Wearable sensors were used in the experiments

    Detecting anomalous behaviour using heterogeneous data

    Get PDF
    In this paper, we propose a method to detect anomalous behaviour using heterogenous data. This method detects anomalies based on the recently introduced approach known as Recursive Density Estimation (RDE) and the so called eccentricity. This method does not require prior assumptions to be made on the type of the data distribution. A simplified form of the well-known Chebyshev condition (inequality) is used for the standardised eccentricity and it applies to any type of distribution. This method is applied to three datasets which include credit card, loyalty card and GPS data. Experimental results show that the proposed method may simplify the complex real cases of forensic investigation which require processing huge amount of heterogeneous data to find anomalies. The proposed method can simplify the tedious job of processing the data and assist the human expert in making important decisions. In our future research, more data will be applied such as natural language (e.g. email, Twitter, SMS) and images

    An evolving approach to unsupervised and Real-Time fault detection in industrial processes

    Get PDF
    Fault detection in industrial processes is a field of application that has gaining considerable attention in the past few years, resulting in a large variety of techniques and methodologies designed to solve that problem. However, many of the approaches presented in literature require relevant amounts of prior knowledge about the process, such as mathematical models, data distribution and pre-defined parameters. In this paper, we propose the application of TEDA - Typicality and Eccentricity Data Analytics - , a fully autonomous algorithm, to the problem of fault detection in industrial processes. In order to perform fault detection, TEDA analyzes the density of each read data sample, which is calculated based on the distance between that sample and all the others read so far. TEDA is an online algorithm that learns autonomously and does not require any previous knowledge about the process nor any user-defined param-eters. Moreover, it requires minimum computational effort, enabling its use for real-time applications. The efficiency of the proposed approach is demonstrated with two different real world industrial plant data streams that provide “normal” and “faulty” data. The results shown in this paper are very encouraging when compared with traditional fault detection approaches
    corecore