62,383 research outputs found
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
Quasi-concave density estimation
Maximum likelihood estimation of a log-concave probability density is
formulated as a convex optimization problem and shown to have an equivalent
dual formulation as a constrained maximum Shannon entropy problem. Closely
related maximum Renyi entropy estimators that impose weaker concavity
restrictions on the fitted density are also considered, notably a minimum
Hellinger discrepancy estimator that constrains the reciprocal of the
square-root of the density to be concave. A limiting form of these estimators
constrains solutions to the class of quasi-concave densities.Comment: Published in at http://dx.doi.org/10.1214/10-AOS814 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Admissible predictive density estimation
Let and be independent
-dimensional multivariate normal vectors with common unknown mean .
Based on observing , we consider the problem of estimating the true
predictive density of under expected Kullback--Leibler loss. Our
focus here is the characterization of admissible procedures for this problem.
We show that the class of all generalized Bayes rules is a complete class, and
that the easily interpretable conditions of Brown and Hwang [Statistical
Decision Theory and Related Topics (1982) III 205--230] are sufficient for a
formal Bayes rule to be admissible.Comment: Published in at http://dx.doi.org/10.1214/07-AOS506 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Empirical Bayes conditional density estimation
The problem of nonparametric estimation of the conditional density of a
response, given a vector of explanatory variables, is classical and of
prominent importance in many prediction problems since the conditional density
provides a more comprehensive description of the association between the
response and the predictor than, for instance, does the regression function.
The problem has applications across different fields like economy, actuarial
sciences and medicine. We investigate empirical Bayes estimation of conditional
densities establishing that an automatic data-driven selection of the prior
hyper-parameters in infinite mixtures of Gaussian kernels, with
predictor-dependent mixing weights, can lead to estimators whose performance is
on par with that of frequentist estimators in being minimax-optimal (up to
logarithmic factors) rate adaptive over classes of locally H\"older smooth
conditional densities and in performing an adaptive dimension reduction if the
response is independent of (some of) the explanatory variables which,
containing no information about the response, are irrelevant to the purpose of
estimating its conditional density
- …