1,141,508 research outputs found
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
Quasi-concave density estimation
Maximum likelihood estimation of a log-concave probability density is
formulated as a convex optimization problem and shown to have an equivalent
dual formulation as a constrained maximum Shannon entropy problem. Closely
related maximum Renyi entropy estimators that impose weaker concavity
restrictions on the fitted density are also considered, notably a minimum
Hellinger discrepancy estimator that constrains the reciprocal of the
square-root of the density to be concave. A limiting form of these estimators
constrains solutions to the class of quasi-concave densities.Comment: Published in at http://dx.doi.org/10.1214/10-AOS814 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Trimmed Density Ratio Estimation
Density ratio estimation is a vital tool in both machine learning and
statistical community. However, due to the unbounded nature of density ratio,
the estimation procedure can be vulnerable to corrupted data points, which
often pushes the estimated ratio toward infinity. In this paper, we present a
robust estimator which automatically identifies and trims outliers. The
proposed estimator has a convex formulation, and the global optimum can be
obtained via subgradient descent. We analyze the parameter estimation error of
this estimator under high-dimensional settings. Experiments are conducted to
verify the effectiveness of the estimator.Comment: Made minor revisions. Restructured the introductory section
Discriminative Density-ratio Estimation
The covariate shift is a challenging problem in supervised learning that
results from the discrepancy between the training and test distributions. An
effective approach which recently drew a considerable attention in the research
community is to reweight the training samples to minimize that discrepancy. In
specific, many methods are based on developing Density-ratio (DR) estimation
techniques that apply to both regression and classification problems. Although
these methods work well for regression problems, their performance on
classification problems is not satisfactory. This is due to a key observation
that these methods focus on matching the sample marginal distributions without
paying attention to preserving the separation between classes in the reweighted
space. In this paper, we propose a novel method for Discriminative
Density-ratio (DDR) estimation that addresses the aforementioned problem and
aims at estimating the density-ratio of joint distributions in a class-wise
manner. The proposed algorithm is an iterative procedure that alternates
between estimating the class information for the test data and estimating new
density ratio for each class. To incorporate the estimated class information of
the test data, a soft matching technique is proposed. In addition, we employ an
effective criterion which adopts mutual information as an indicator to stop the
iterative procedure while resulting in a decision boundary that lies in a
sparse region. Experiments on synthetic and benchmark datasets demonstrate the
superiority of the proposed method in terms of both accuracy and robustness
- …