48,847 research outputs found
Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization
Regression aims at estimating the conditional mean of output given input.
However, regression is not informative enough if the conditional density is
multimodal, heteroscedastic, and asymmetric. In such a case, estimating the
conditional density itself is preferable, but conditional density estimation
(CDE) is challenging in high-dimensional space. A naive approach to coping with
high-dimensionality is to first perform dimensionality reduction (DR) and then
execute CDE. However, such a two-step process does not perform well in practice
because the error incurred in the first DR step can be magnified in the second
CDE step. In this paper, we propose a novel single-shot procedure that performs
CDE and DR simultaneously in an integrated way. Our key idea is to formulate DR
as the problem of minimizing a squared-loss variant of conditional entropy, and
this is solved via CDE. Thus, an additional CDE step is not needed after DR. We
demonstrate the usefulness of the proposed method through extensive experiments
on various datasets including humanoid robot transition and computer art
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation
The ratio between two probability density functions is an important component
of various tasks, including selection bias correction, novelty detection and
classification. Recently, several estimators of this ratio have been proposed.
Most of these methods fail if the sample space is high-dimensional, and hence
require a dimension reduction step, the result of which can be a significant
loss of information. Here we propose a simple-to-implement, fully nonparametric
density ratio estimator that expands the ratio in terms of the eigenfunctions
of a kernel-based operator; these functions reflect the underlying geometry of
the data (e.g., submanifold structure), often leading to better estimates
without an explicit dimension reduction step. We show how our general framework
can be extended to address another important problem, the estimation of a
likelihood function in situations where that function cannot be
well-approximated by an analytical form. One is often faced with this situation
when performing statistical inference with data from the sciences, due the
complexity of the data and of the processes that generated those data. We
emphasize applications where using existing likelihood-free methods of
inference would be challenging due to the high dimensionality of the sample
space, but where our spectral series method yields a reasonable estimate of the
likelihood function. We provide theoretical guarantees and illustrate the
effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia
A Generic Path Algorithm for Regularized Statistical Estimation
Regularization is widely used in statistics and machine learning to prevent
overfitting and gear solution towards prior information. In general, a
regularized estimation problem minimizes the sum of a loss function and a
penalty term. The penalty term is usually weighted by a tuning parameter and
encourages certain constraints on the parameters to be estimated. Particular
choices of constraints lead to the popular lasso, fused-lasso, and other
generalized penalized regression methods. Although there has been a lot
of research in this area, developing efficient optimization methods for many
nonseparable penalties remains a challenge. In this article we propose an exact
path solver based on ordinary differential equations (EPSODE) that works for
any convex loss function and can deal with generalized penalties as well
as more complicated regularization such as inequality constraints encountered
in shape-restricted regressions and nonparametric density estimation. In the
path following process, the solution path hits, exits, and slides along the
various constraints and vividly illustrates the tradeoffs between goodness of
fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC,
or cross-validation to select an optimal tuning parameter. Our
applications to generalized regularized generalized linear models,
shape-restricted regressions, Gaussian graphical models, and nonparametric
density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure
Robust and Sparse Regression via -divergence
In high-dimensional data, many sparse regression methods have been proposed.
However, they may not be robust against outliers. Recently, the use of density
power weight has been studied for robust parameter estimation and the
corresponding divergences have been discussed. One of such divergences is the
-divergence and the robust estimator using the -divergence is
known for having a strong robustness. In this paper, we consider the robust and
sparse regression based on -divergence. We extend the
-divergence to the regression problem and show that it has a strong
robustness under heavy contamination even when outliers are heterogeneous. The
loss function is constructed by an empirical estimate of the
-divergence with sparse regularization and the parameter estimate is
defined as the minimizer of the loss function. To obtain the robust and sparse
estimate, we propose an efficient update algorithm which has a monotone
decreasing property of the loss function. Particularly, we discuss a linear
regression problem with regularization in detail. In numerical
experiments and real data analyses, we see that the proposed method outperforms
past robust and sparse methods.Comment: 25 page
On the use of robust regression in econometrics
The use of robust regression estimators has gained popularity among applied econometricians. The main argument invoked to justify the use of the robust estimators is that they provide efficiency gains in the presence of outliers or non-normal errors. Unfortunately, most practitioners seem to be unaware of the fact that heteroskedastic and skewed errors can dramatically affect the properties of these estimators. In this paper we reconsider the interpretation of the specific robust estimator that has become popular in applied econometrics, and conclude that its use in this context cannot be generally recommended.
- …