1,573 research outputs found
Bayesian Adaptive Bandwidth Kernel Density Estimation of Irregular Multivariate Distributions
Kernel density estimation is an important technique for understanding the distributional properties of data. Some investigations have found that the estimation of a global bandwidth can be heavily affected by observations in the tail. We propose to categorize data into low- and high-density regions, to which we assign two different bandwidths called the low-density adaptive bandwidths. We derive the posterior of the bandwidth parameters through the Kullback-Leibler information. A Bayesian sampling algorithm is presented to estimate the bandwidths. Monte Carlo simulations are conducted to examine the performance of the proposed Bayesian sampling algorithm in comparison with the performance of the normal reference rule and a Bayesian sampling algorithm for estimating a global bandwidth. According to Kullback-Leibler information, the kernel density estimator with low-density adaptive bandwidths estimated through the proposed Bayesian sampling algorithm outperforms the density estimators with bandwidth estimated through the two competitors. We apply the low-density adaptive kernel density estimator to the estimation of the bivariate density of daily stock-index returns observed from the U.S. and Australian stock markets. The derived conditional distribution of the Australian stock-index return for a given daily return in the U.S. market enables market analysts to understand how the former market is associated with the latter.conditional density; global bandwidth; Kullback-Leibler information; marginal likelihood; Markov chain Monte Carlo; S&P500 index
Structure in the 3D Galaxy Distribution: I. Methods and Example Results
Three methods for detecting and characterizing structure in point data, such
as that generated by redshift surveys, are described: classification using
self-organizing maps, segmentation using Bayesian blocks, and density
estimation using adaptive kernels. The first two methods are new, and allow
detection and characterization of structures of arbitrary shape and at a wide
range of spatial scales. These methods should elucidate not only clusters, but
also the more distributed, wide-ranging filaments and sheets, and further allow
the possibility of detecting and characterizing an even broader class of
shapes. The methods are demonstrated and compared in application to three data
sets: a carefully selected volume-limited sample from the Sloan Digital Sky
Survey redshift data, a similarly selected sample from the Millennium
Simulation, and a set of points independently drawn from a uniform probability
distribution -- a so-called Poisson distribution. We demonstrate a few of the
many ways in which these methods elucidate large scale structure in the
distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written
introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures
please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd
Locally adaptive factor processes for multivariate time series
In modeling multivariate time series, it is important to allow time-varying
smoothness in the mean and covariance process. In particular, there may be
certain time intervals exhibiting rapid changes and others in which changes are
slow. If such time-varying smoothness is not accounted for, one can obtain
misleading inferences and predictions, with over-smoothing across erratic time
intervals and under-smoothing across times exhibiting slow variation. This can
lead to mis-calibration of predictive intervals, which can be substantially too
narrow or wide depending on the time. We propose a locally adaptive factor
process for characterizing multivariate mean-covariance changes in continuous
time, allowing locally varying smoothness in both the mean and covariance
matrix. This process is constructed utilizing latent dictionary functions
evolving in time through nested Gaussian processes and linearly related to the
observed data with a sparse mapping. Using a differential equation
representation, we bypass usual computational bottlenecks in obtaining MCMC and
online algorithms for approximate Bayesian inference. The performance is
assessed in simulations and illustrated in a financial application
Semiparametric posterior limits
We review the Bayesian theory of semiparametric inference following Bickel
and Kleijn (2012) and Kleijn and Knapik (2013). After an overview of efficiency
in parametric and semiparametric estimation problems, we consider the
Bernstein-von Mises theorem (see, e.g., Le Cam and Yang (1990)) and generalize
it to (LAN) regular and (LAE) irregular semiparametric estimation problems. We
formulate a version of the semiparametric Bernstein-von Mises theorem that does
not depend on least-favourable submodels, thus bypassing the most restrictive
condition in the presentation of Bickel and Kleijn (2012). The results are
applied to the (regular) estimation of the linear coefficient in partial linear
regression (with a Gaussian nuisance prior) and of the kernel bandwidth in a
model of normal location mixtures (with a Dirichlet nuisance prior), as well as
the (irregular) estimation of the boundary of the support of a monotone family
of densities (with a Gaussian nuisance prior).Comment: 47 pp., 1 figure, submitted for publication. arXiv admin note:
substantial text overlap with arXiv:1007.017
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling
This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well
- …