96,329 research outputs found

    Robust Kernel Density Function Estimation

    Get PDF
    The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-weighted Least Squares (IRWLS) algorithm for Robust Kernel Density Estimation (RKDE). However, the weakness of IRWLS based estimator is that its computation time is very long. The shortcoming of such RKDE has inspired us to propose new non-iterative and unsupervised based approaches which are faster, more accurate and more flexible. The proposed estimators are based on our newly developed Robust Kernel Weight Function (RKWF) and Robust Density Weight Function (RDWF). The basic idea of RKWF based method is to first define a function which measures the outlying distance of observation. The resultant distances are manipulated to obtain the robust weights. The statement of Chandola et al. (2009) that the normal (clean) data appear in high probability area of stochastic model, while the outliers appear in low probability area of stochastic model, has motivated us to develop RDWF. Based on this notion, we employ the pilot (preliminary) estimate of density function as initial similarity (or distance) measure of observations with the neighbours. The modified similarity measures produce the robust weights to estimate density function robustly. Subsequently, the robust weights are incorporated in the kernel function to formulate the robust density function estimation. An extensive simulation study has been carried out to assess the performance of the RKWF-based estimator and RDWF-based estimator. The RKDE based on RKWF and RDWF perform as good as the classical Kernel Density Estimator (KDE) in outlier free data sets. Nonetheless, their performances are faster, more accurate and more reliable than the IRWLS approach for contaminated data sets. The classical kernel density function estimation approach is widely used in various formula and methods. Unfortunately, many researchers are not aware that the KDE is easily affected by outliers. We have proposed the RKDE which is more efficient and consumes less time. Our work on RKDE or corresponding robust weights has motivated us to develop alternative location and scale estimators. A modification is made to the classical location and scale estimator by incorporating the robust weight and RKDE. To evaluate the efficiency of the proposed method, comprehensive contaminated models are designed and simulated. The accuracy of the proposed new method was compared with the location and scale estimators based on M. Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimator. The simulation study demonstrates that, on the whole, the accuracy of the proposed method is better than the competitor methods. The research also develops two new approaches for outlier and potential outlier detection in unimodal and multimodal distributions. The distance of observations from the center of data set is incorporated in the formulation of the first outlier detection method in unimodal distribution. The second method attempts to define an approach that is useable not only for unimodal distribution but also for multimodal distribution. This approach incorporates robust weights, whereby, high weights and low weights are assigned to normal (clean) and outlying observations, respectively. In this thesis, we also illustrate that the sensitivity of RKDE depends on the setting of the tuning constants of the employed loss function. The results of the study indicate that the proposed methods are capable of labelling normal observation and potential outliers in a data set. Additionally, they are able to assign anomaly scores to normal and outlying observations. Finally this thesis also addresses the estimation of Mutual Information (MI) for mixture distribution which prone to create two distant groups in the data. The formulation of MI involves estimation of density function. Mutual information estimate for bivariate random variables involves the bivariate density estimation. The bivariate density estimation employs the estimate of covariance matrix. The sensitivity of covariance matrix to the presence of outliers has motivated us to substitute it with robust estimate derived from MCD and MVE. The efficiency of the modified mutual information estimate is evaluated based on its accuracy. To do this evaluation, the mixtures of bivariate normal distribution with different percentage of contribution are simulated. Simulation results show that the new formulation of MI increases the accuracy of mutual information estimation

    Local Multiplicative Bias Correction for Asymmetric Kernel Density Estimators

    Get PDF
    We consider semiparametric asymmetric kernel density estimators when the unknown density has support on [0, ¥). We provide a unifying framework which contains asymmetric kernel versions of several semiparametric density estimators considered previously in the literature. This framework allows us to use popular parametric models in a nonparametric fashion and yields estimators which are robust to misspecification. We further develop a specification test to determine if a density belongs to a particular parametric family. The proposed estimators outperform rival non- and semiparametric estimators in finite samples and are simple to implement. We provide applications to loss data from a large Swiss health insurer and Brazilian income data.semiparametric density estimation; asymmetric kernel; income distribution; loss distribution; health insurance; specification testing

    Rainbow plots, Bagplots and Boxplots for Functional Data

    Get PDF
    We propose new tools for visualizing large numbers of functional data in the form of smooth curves or surfaces. The proposed tools include functional versions of the bagplot and boxplot, and make use of the first two robust principal component scores, Tukey's data depth and highest density regions. By-products of our graphical displays are outlier detection methods for functional data. We compare these new outlier detection methods with exiting methods for detecting outliers in functional data and show that our methods are better able to identify the outliers.Highest density regions, Robust principal component analysis, Kernel density estimation, Outlier detection, Tukey's halfspace depth

    Statistical computation of Boltzmann entropy and estimation of the optimal probability density function from statistical sample

    Full text link
    In this work, we investigate the statistical computation of the Boltzmann entropy of statistical samples. For this purpose, we use both histogram and kernel function to estimate the probability density function of statistical samples. We find that, due to coarse-graining, the entropy is a monotonic increasing function of the bin width for histogram or bandwidth for kernel estimation, which seems to be difficult to select an optimal bin width/bandwidth for computing the entropy. Fortunately, we notice that there exists a minimum of the first derivative of entropy for both histogram and kernel estimation, and this minimum point of the first derivative asymptotically points to the optimal bin width or bandwidth. We have verified these findings by large amounts of numerical experiments. Hence, we suggest that the minimum of the first derivative of entropy be used as a selector for the optimal bin width or bandwidth of density estimation. Moreover, the optimal bandwidth selected by the minimum of the first derivative of entropy is purely data-based, independent of the unknown underlying probability density distribution, which is obviously superior to the existing estimators. Our results are not restricted to one-dimensional, but can also be extended to multivariate cases. It should be emphasized, however, that we do not provide a robust mathematical proof of these findings, and we leave these issues with those who are interested in them.Comment: 8 pages, 6 figures, MNRAS, in the pres

    Density and Hazard Rate Estimation for Censored and ?-mixing Data Using Gamma Kernels

    Get PDF
    In this paper we consider the nonparametric estimation for a density and hazard rate function for right censored ?-mixing survival time data using kernel smoothing techniques. Since survival times are positive with potentially a high concentration at zero, one has to take into account the bias problems when the functions are estimated in the boundary region. In this paper, gamma kernel estimators of the density and the hazard rate function are proposed. The estimators use adaptive weights depending on the point in which we estimate the function, and they are robust to the boundary bias problem. For both estimators, the mean squared error properties, including the rate of convergence, the almost sure consistency and the asymptotic normality are investigated. The results of a simulation demonstrate the excellent performance of the proposed estimators.Gamma kernel, Kaplan Meier, density and hazard function, mean integrated squared error, consistency, asymptotic normality.

    Bootstrap CI and test statistics for kernel density estimates using Stata

    Get PDF
    In recent years non-parametric density estimation has been extensively employed in several fields as a powerful descriptive tool, which is far more informative and robust than histograms. Moreover, the increased computation power of modern computers has made non-parametric density estimation a relatively "cheap" computation, helping to easily detect unexpected aspects of the distribution such as bimodality. However, it is also often neglected that non-parametric methods can only provide an estimate of the true density, whose reliability depends on various factors, such as the number of data available and the bandwidth. We will focus here on kernel density estimation and discuss the problem of computing bootstrap confidence intervals and test statistics for point-wise density estimation using Stata. Construction of confidence intervals and test of hypothesis about the true density are carried out using an asymptotically pivotal studentized statistic after computing a suitable estimator for its variance. The issue of asymptotic biased correction is also discussed and tackled.

    Particle approximations of the score and observed information matrix for parameter estimation in state space models with linear computational cost

    Full text link
    Poyiadjis et al. (2011) show how particle methods can be used to estimate both the score and the observed information matrix for state space models. These methods either suffer from a computational cost that is quadratic in the number of particles, or produce estimates whose variance increases quadratically with the amount of data. This paper introduces an alternative approach for estimating these terms at a computational cost that is linear in the number of particles. The method is derived using a combination of kernel density estimation, to avoid the particle degeneracy that causes the quadratically increasing variance, and Rao-Blackwellisation. Crucially, we show the method is robust to the choice of bandwidth within the kernel density estimation, as it has good asymptotic properties regardless of this choice. Our estimates of the score and observed information matrix can be used within both online and batch procedures for estimating parameters for state space models. Empirical results show improved parameter estimates compared to existing methods at a significantly reduced computational cost. Supplementary materials including code are available.Comment: Accepted to Journal of Computational and Graphical Statistic

    The performance of mutual information for mixture of bivariate normal disatributions based on robust kernel estimation.

    Get PDF
    Mutual Information (MI) measures the degree of association between variables in nonlinear model as well as linear models. It can also be used to measure the dependency between variables in mixture distribution. The MI is estimated based on the estimated values of the joint density function and the marginal density functions of X and Y. A variety of methods for the estimation of the density function have been recommended. In this paper, we only considered the kernel method to estimate the density function. However, the classical kernel density estimator is not reliable when dealing with mixture density functions which prone to create two distant groups in the data. In this situation a robust kernel density estimator is proposed to acquire a more efficient MI estimate in mixture distribution. The performance of the robust MI is investigated extensively by Monte Carlo simulations. The results of the study offer substantial improvement over the existing techniques
    corecore