13,863 research outputs found

    Robust Kernel Density Function Estimation

    Get PDF
    The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-weighted Least Squares (IRWLS) algorithm for Robust Kernel Density Estimation (RKDE). However, the weakness of IRWLS based estimator is that its computation time is very long. The shortcoming of such RKDE has inspired us to propose new non-iterative and unsupervised based approaches which are faster, more accurate and more flexible. The proposed estimators are based on our newly developed Robust Kernel Weight Function (RKWF) and Robust Density Weight Function (RDWF). The basic idea of RKWF based method is to first define a function which measures the outlying distance of observation. The resultant distances are manipulated to obtain the robust weights. The statement of Chandola et al. (2009) that the normal (clean) data appear in high probability area of stochastic model, while the outliers appear in low probability area of stochastic model, has motivated us to develop RDWF. Based on this notion, we employ the pilot (preliminary) estimate of density function as initial similarity (or distance) measure of observations with the neighbours. The modified similarity measures produce the robust weights to estimate density function robustly. Subsequently, the robust weights are incorporated in the kernel function to formulate the robust density function estimation. An extensive simulation study has been carried out to assess the performance of the RKWF-based estimator and RDWF-based estimator. The RKDE based on RKWF and RDWF perform as good as the classical Kernel Density Estimator (KDE) in outlier free data sets. Nonetheless, their performances are faster, more accurate and more reliable than the IRWLS approach for contaminated data sets. The classical kernel density function estimation approach is widely used in various formula and methods. Unfortunately, many researchers are not aware that the KDE is easily affected by outliers. We have proposed the RKDE which is more efficient and consumes less time. Our work on RKDE or corresponding robust weights has motivated us to develop alternative location and scale estimators. A modification is made to the classical location and scale estimator by incorporating the robust weight and RKDE. To evaluate the efficiency of the proposed method, comprehensive contaminated models are designed and simulated. The accuracy of the proposed new method was compared with the location and scale estimators based on M. Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimator. The simulation study demonstrates that, on the whole, the accuracy of the proposed method is better than the competitor methods. The research also develops two new approaches for outlier and potential outlier detection in unimodal and multimodal distributions. The distance of observations from the center of data set is incorporated in the formulation of the first outlier detection method in unimodal distribution. The second method attempts to define an approach that is useable not only for unimodal distribution but also for multimodal distribution. This approach incorporates robust weights, whereby, high weights and low weights are assigned to normal (clean) and outlying observations, respectively. In this thesis, we also illustrate that the sensitivity of RKDE depends on the setting of the tuning constants of the employed loss function. The results of the study indicate that the proposed methods are capable of labelling normal observation and potential outliers in a data set. Additionally, they are able to assign anomaly scores to normal and outlying observations. Finally this thesis also addresses the estimation of Mutual Information (MI) for mixture distribution which prone to create two distant groups in the data. The formulation of MI involves estimation of density function. Mutual information estimate for bivariate random variables involves the bivariate density estimation. The bivariate density estimation employs the estimate of covariance matrix. The sensitivity of covariance matrix to the presence of outliers has motivated us to substitute it with robust estimate derived from MCD and MVE. The efficiency of the modified mutual information estimate is evaluated based on its accuracy. To do this evaluation, the mixtures of bivariate normal distribution with different percentage of contribution are simulated. Simulation results show that the new formulation of MI increases the accuracy of mutual information estimation

    A Local Density-Based Approach for Local Outlier Detection

    Full text link
    This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only kk nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
    corecore