Search CORE

33 research outputs found

An evolving approach to data streams clustering based on typicality and eccentricity data analytics

Author: Angelov P.P.
Bezerra C.G.
Costa B.S.J.
Guedes L.A.
Publication venue: 'Elsevier BV'
Publication date: 31/05/2020
Field of study

In this paper we propose an algorithm for online clustering of data streams. This algorithm is called AutoCloud and is based on the recently introduced concept of Typicality and Eccentricity Data Analytics, mainly used for anomaly detection tasks. AutoCloud is an evolving, online and recursive technique that does not need training or prior knowledge about the data set. Thus, AutoCloud is fully online, requiring no offline processing. It allows creation and merging of clusters autonomously as new data observations become available. The clusters created by AutoCloud are called data clouds, which are structures without pre-defined shape or boundaries. AutoCloud allows each data sample to belong to multiple data clouds simultaneously using fuzzy concepts. AutoCloud is also able to handle concept drift and concept evolution, which are problems that are inherent in data streams in general. Since the algorithm is recursive and online, it is suitable for applications that require a real-time response. We validate our proposal with applications to multiple well known data sets in the literature

Lancaster E-Prints

Applications of Autonomous Data Partitioning

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, the algorithm summaries of both, the offline and evolving versions of the proposed autonomous data partitioning (ADP) algorithm described in chapter 7 are provided. Numerical examples based on well-known benchmark datasets are presented for evaluating the performance of the ADP algorithm on data partitioning. Furthermore, numerical examples on semi-supervised classification are also conducted as a potential application of the ADP algorithm. The state-of-the-art approaches are used for comparison. Numerical experiments demonstrate that the ADP algorithm is able to perform high quality data partitioning results in a highly efficient, objective manner. The ADP algorithm can also be used for classification even when there is very little supervision available. The pseudo-code of the main procedure of the ADP algorithm and the MATLAB implementations can be found in appendices B.2 and C.2, respectively. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Applications of Autonomous Learning Multi-model Systems

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, the algorithm summaries of the autonomous learning multi-model systems of zero-order (ALMMo-0) and first-order (ALMMo-1) described in Chap. 8 are provided. Numerical examples based on well-known benchmark datasets are presented for evaluating the classification performance of the ALMMo-0 and ALMMo-1 systems. Real-world problems are also used for evaluating the performance of the ALMMo-1 system on regression. Numerical experiments and the comparison with the state-of-the-art approaches demonstrate that ALMMo systems can produce highly accurate classification and regression results on various problems after a very efficient training process. Furthermore, ALMMo systems can learn from streaming data on a sample-by-sample basis, self-evolve its system structure and self-update the meta-parameters continuously with newly observed data, which makes the ALMMo system a very attractive solution for various real world applications. The pseudo-code of the main procedure of the ALMMo-0 system and the MATLAB implementation are provided in appendices B.3 and C.3, and the corresponding pseudo-code and MATLAB implementation of ALMMo-1 systems are provided in appendices B.4 and C.4, respectively. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Transparent Deep Rule-Based Classifiers

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, a new type of deep rule-based (DRB) classifier with a multi-layer architecture is presented for image classification, which combines the computer vision techniques with a massively parallel set of zero-order fuzzy rules as its learning engine. With its prototype-based nature, the DRB classifiers are able to identify a transparent and human-understandable fuzzy rule-based (FRB) system structure from the data through an autonomous, non-iterative, non-parametric and highly parallel online learning process, and offer extremely high classification accuracy. The DRB classifier can start “from scratch”, and conduct classification from the very first image of each class in the same way as humans do. The DRB classifier can also learn in a semi-supervised mode initialized with only a small proportion of the labelled data and continue in a fully unsupervised mode after that. The ability of semi-supervised learning further allows the DRB classifier to learn new classes actively without human experts’ involvement. Thanks to the prototype-based nature of the DRB classifier, it is free from prior assumptions about the type of the data distribution, their random or deterministic nature, and there are no requirements to make ad hoc decisions. Its supervised and semi-supervised learning processes are fully transparent and human-interpretable. The semi-supervised DRB classifiers can perform classification on out-of-sample images and also support recursive online training on a sample-by-sample basis or a batch-by-batch basis. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Anomaly Detection—Empirical Approach

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this chapter, the empirical approach to the problem of anomaly detection is presented, which is free from the pre-defined model and user-and problem-specific parameters and is data driven. The well-known Chebyshev inequality has been simplified by using the standardized eccentricity. An autonomous anomaly detection method is proposed, which is composed of two stages. In the first stage, all the potential global anomalies are selected out based on the data density and/or on the typicality, and in the second stage, the local anomalies are identified based on the data clouds formed from the potential global anomalies. In addition, a fully autonomous approach for the problem of fault detection has been outlined, which can also be extended to a fully autonomous fault detection and isolation approach. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Applications of Semi-supervised Deep Rule-Based Classifiers

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, the algorithm summary of the main procedure of the semi-supervised deep rule-based (SS_DRB) classifier described in Chap. 9 is provided, which serves as a powerful extension of the DRB classifier. The offline learning process of the SS_DRB classifier is illustrated and the performance of the SS_DRB algorithm is evaluated based on benchmark image sets. Numerical examples and comparison with the state-of-the-art semi-supervised learning approaches demonstrate that SS_DRB classifier can achieve highly accurate classification results with only a handful of labelled training images, and it consistently outperforms the alternative approaches. The pseudo-code of the main procedure of the SS_DRB classifier and the MATLAB implementations can be found in appendices B.6 and C.6, respectively. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Applications of Deep Rule-Based Classifiers

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, the algorithm summary of the main procedure of the deep rule-based (DRB) classifier described in Chap. 9 is provided. Numerical examples based on popular benchmark image sets including, handwritten digits recognition, remote sensing scene classification, face recognition and object recognition, etc., are presented for evaluating the performance of the DRB algorithm on image classification, and the state-of-the-art approaches are used for comparison. Numerical experiments show that DRB classifier is able to perform highly accurate classification in various image classification problems, and also demonstrate the advantages of its prototype-based nature and transparency over the existing approaches. The pseudo-code of the main procedure of the DRB classifier and the MATLAB implementations can be found in appendices B.5 and C.5, respectively. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Brief Introduction to Statistical Machine Learning

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

In this chapter, an overview of the theory of probability, statistical and machine learning is made covering the main ideas and the most popular and widely used methods in this area. As a starting point, the randomness and determinism as well as the nature of the real-world problems are discussed. Then, the basic and well-known topics of the traditional probability theory and statistics including the probability mass and distribution, probability density and moments, density estimation, Bayesian and other branches of the probability theory, are recalled followed by a analysis. The well-known data pre-processing techniques, unsupervised and supervised machine learning methods are covered. These include a brief introduction of the distance metrics, normalization and standardization, feature selection, orthogonalization as well as a review of the most representative clustering, classification, regression and prediction approaches of various types. In the end, the topic of image processing is also briefly covered including the popular image transformation techniques, and a number of image feature extraction techniques at three different levels. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Brief Introduction to Computational Intelligence

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2018
Field of study

This chapter provides a detailed introduction to the basic concepts and the general principles of the fuzzy sets and systems theory. Three major types of FRB systems are also covered and their differences are analyzed. The design of FRB systems is also covered. This chapter further moves on to the ANNs, which include the feedforward neural networks and three types of deep learning models. Both of the FRB systems and the ANNs have been proven universal approximators and can be designed based on the data. FRB systems have transparent, human-interpretable internal representation and can take advantage of the human domain expert knowledge. They are excellent in dealing with uncertainties, and they can self-organize, self-update both the structures and parameters in an online, dynamic environment. While ANNs are excellent in providing high precisions in most cases, they are fragile when facing new data patterns. They are typical examples of “black box” systems, their training process is usually limited to offline mode and requires huge amount of computation resources and data. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints

Data Partitioning—Empirical Approach

Author: Angelov P.P.
Gu X.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this chapter, a new empirical approach, named autonomous data partitioning, is proposed to partition the data autonomously by creating a Voronoi tessellation around the objectively identified prototypes to form data clouds, which transform the large amount of raw data into a much smaller (manageable) number of more representative aggregations with semantic meaning. The proposed empirical algorithm has two forms/types, namely, the offline version and the evolving version. The offline version is based on the ranks of the observations in terms of their multimodal typicality values and local ensemble properties. The evolving version is for streaming data processing and works with the data density. It is able to start “from scratch”, but can create a hybrid with the offline version as well. Moreover, an algorithm is proposed to guarantee the local optimality of the autonomous data partitioning approach allowing the proposed approach to end up with a locally optimal structure of data clouds represented by their focal points/prototypes, which is then ready to be used for analysis, building a multi-model classifier, predictor, controller or for fault isolation. © 2019, Springer Nature Switzerland AG

Lancaster E-Prints