242,306 research outputs found
Multivariate Density Estimation and Visualization
This chapter examines the use of flexible methods to approximate an unknown density function, and techniques appropriate for visualization of densities in up to four dimensions. The statistical analysis of data is a multilayered endeavor. Data must be carefully examined and cleaned to avoid spurious findings. A preliminary examination of data by graphical means is useful for this purpose. Graphical exploration of data was popularized by Tukey (1977) in his book on exploratory data analysis (EDA). Modern data mining packages also include an array of graphical tools such as the histogram, which is the simplest example of a density estimator. Exploring data is particularly challenging when the sample size is massive or if the number of variables exceeds a handful. In either situation, the use of nonparametric density estimation can aid in the fundamental goal of understanding the important features hidden in the data. In the following sections, the algorithms and theory of nonparametric density estimation will be described, as well as descriptions of the visualization of multivariate data and density estimates. For simplicity, the discussion will assume the data and functions are continuous. Extensions to discrete and mixed data are straightforward. --
Bayesian multivariate mixed-scale density estimation
Although continuous density estimation has received abundant attention in the
Bayesian nonparametrics literature, there is limited theory on multivariate
mixed scale density estimation. In this note, we consider a general framework
to jointly model continuous, count and categorical variables under a
nonparametric prior, which is induced through rounding latent variables having
an unknown density with respect to Lebesgue measure. For the proposed class of
priors, we provide sufficient conditions for large support, strong consistency
and rates of posterior contraction. These conditions allow one to convert
sufficient conditions obtained in the setting of multivariate continuous
density estimation to the mixed scale case. To illustrate the procedure a
rounded multivariate nonparametric mixture of Gaussians is introduced and
applied to a crime and communities dataset
Nonparametric density estimation for multivariate bounded data
We propose a new nonparametric estimator for the density function of multivariate bounded data. As frequently observed in practice, the variables may be partially bounded (e.g., nonnegative) or completely bounded (e.g., in the unit interval). In addition, the variables may have a point mass. We reduce the conditions on the underlying density to a minimum by proposing anonparametric approach. By using a gamma, a beta, or a local linear kernel (also called boundary kernels), in a product kernel, the suggested estimator becomes simple in implementation and robust to the well known boundary bias problem. We investigate the mean integrated squared error properties, including the rate of convergence, uniform strong consistency and asymptoticnormality. We establish consistency of the least squares cross-validation method to select optimal bandwidth parameters. A detailed simulation study investigates the performance of the estimators. Applications using lottery and corporate finance data are provided.asymmetric kernels, multivariate boundary bias, nonparametric multivariate density estimation, asymptotic properties, bandwidth selection, least squares cross- validation
Nonparametric Density Estimation for Multivariate Bounded Data
We propose a new nonparametric estimator for the density function of multivariate bounded data. As frequently observed in practice, the variables may be partially bounded (e.g., nonnegative) or completely bounded (e.g., in the unit interval). In addition, the variables may have a point mass. We reduce the conditions on the underlying density to a minimum by proposing a nonparametric approach. By using a gamma, a beta, or a local linear kernel (also called boundary kernels), in a product kernel, the suggested estimator becomes simple in implementation and robust to the well-known boundary bias problem. We investigate the mean integrated squared error properties, including the rate of convergence, uniform strong consistency and asymptotic normality. We establish consistency of the least squares cross-validation method to select optimal bandwidth parameters. A detailed simulation study investigates the performance of the estimators. Applications using lottery and corporate finance data are provided.Asymmetric kernels, multivariate boudnary bias, nonparametric multivariate density estimation, asymptotic properties, bandwidth selection, least squares cross-validation
- …
