3 research outputs found

    A new density estimation neural network to detect abnormal condition in streaming data

    Get PDF
    Along with the development of monitoring technologies, numerous measured data pour into monitoring system and form the high-volume and open-ended data stream. Usually, abnormal condition of monitored system can be characterized by the density variation of measured data stream. However, traditional density estimation methods can not dynamically track density variation of data stream due to the limitation of processing time and computation memory. In this paper, we propose a new density estimation neural network to continuously estimate the density of streaming data in a time-based sliding window. The network has a feedforward structure composed of discretization, input and summation layer. In the discretization layer, value range of data stream is discretized to network nodes with equal intervals. Measured data in the predefined time window are pushed into input layer and updated with the window sliding. In summation layer, the activation results between input neurons and discretization neurons are summed up and multiplied by a weight factor. The network outputs the kernel density estimators of sliding segment in data stream and achieves a one-pass estimation algorithm consuming constant computation memory. By subnet separation and local activation, computation load of the network is significantly reduced to catch up the pace of data stream. The nonlinear statistics, quantile and entropy, which can be consecutively figured out with the density estimators output by the density estimation neural network, are calculated as condition indictors to track the density variation of data stream. The proposed method is evaluated by a simulated data stream consisting of two mixing distribution data sets and a pressure data stream measured from a centrifugal compressor respectively. Results show that the underlying anomalies are successfully detected

    Offline and Online Density Estimation for Large High-Dimensional Data

    Get PDF
    Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient. This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is presented. Also, some example applications of the high-dimensional density estimation in density-based classification and clustering are presented. Another challenge in the area of density estimation rises in dealing with online sources of data, where data is arriving over an open-ended and non-stationary stream. This calls for efficient algorithms for online density estimation. An online density estimator needs to be capable of providing up-to-date estimates of the density, bound to the available computing resources and requirements of the application. In response to this, BBSP method for online density estimation is introduced. It works based on collecting and processing the data in blocks of fixed size, followed by a weighted averaging over block-wise estimates of the density. Proper choice of block size is discussed via simulations for streams of synthetic and real datasets. Further, with the purpose of efficiency improvement in offline and online density estimation, progressive update of the binary partitions in BBSP is proposed, which as simulation results show, leads into improved accuracy as well as speed-up, for various block sizes

    Towards Kernel Density Estimation over Streaming Data

    No full text
    A variety of real-world applications heavily relies on the analysis of transient data streams. Due to the rigid processing requirements of data streams, common analysis techniques as known from data mining are not applicable. A fundamental building block of many data mining and analysis approaches is density estimation. It provides a well-defined estimation of a continuous data distribution, a fact which makes its adaptation to data streams desirable. A convenient method for density estimation utilizes kernels. However, its computational complexity collides with the processing requirements of data streams. In this work, we present a new approach to this problem that combines linear processing cost with a constant amount of allocated memory. We even support a dynamic memory adaptation to changing system resources. Our kernel density estimators over streaming data are related to M-Kernels, a previously proposed technique, but substantially improve them in terms of accuracy as well as processing time. The results of an experimental study with synthetic and real-world data streams substantiate the efficiency and effectiveness of our approach as well as its superiority to M-Kernels with respect to estimation quality and processing time.
    corecore