39,091 research outputs found
Fast conditional density estimation for quantitative structure-activity relationships
Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181
Targeted Maximum Likelihood Estimation using Exponential Families
Targeted maximum likelihood estimation (TMLE) is a general method for
estimating parameters in semiparametric and nonparametric models. Each
iteration of TMLE involves fitting a parametric submodel that targets the
parameter of interest. We investigate the use of exponential families to define
the parametric submodel. This implementation of TMLE gives a general approach
for estimating any smooth parameter in the nonparametric model. A computational
advantage of this approach is that each iteration of TMLE involves estimation
of a parameter in an exponential family, which is a convex optimization problem
for which software implementing reliable and computationally efficient methods
exists. We illustrate the method in three estimation problems, involving the
mean of an outcome missing at random, the parameter of a median regression
model, and the causal effect of a continuous exposure, respectively. We conduct
a simulation study comparing different choices for the parametric submodel,
focusing on the first of these problems. To the best of our knowledge, this is
the first study investigating robustness of TMLE to different specifications of
the parametric submodel. We find that the choice of submodel can have an
important impact on the behavior of the estimator in finite samples
Efficient prediction for linear and nonlinear autoregressive models
Conditional expectations given past observations in stationary time series
are usually estimated directly by kernel estimators, or by plugging in kernel
estimators for transition densities. We show that, for linear and nonlinear
autoregressive models driven by independent innovations, appropriate smoothed
and weighted von Mises statistics of residuals estimate conditional
expectations at better parametric rates and are asymptotically efficient. The
proof is based on a uniform stochastic expansion for smoothed and weighted von
Mises processes of residuals. We consider, in particular, estimation of
conditional distribution functions and of conditional quantile functions.Comment: Published at http://dx.doi.org/10.1214/009053606000000812 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Inference on Counterfactual Distributions
Counterfactual distributions are important ingredients for policy analysis
and decomposition analysis in empirical economics. In this article we develop
modeling and inference tools for counterfactual distributions based on
regression methods. The counterfactual scenarios that we consider consist of
ceteris paribus changes in either the distribution of covariates related to the
outcome of interest or the conditional distribution of the outcome given
covariates. For either of these scenarios we derive joint functional central
limit theorems and bootstrap validity results for regression-based estimators
of the status quo and counterfactual outcome distributions. These results allow
us to construct simultaneous confidence sets for function-valued effects of the
counterfactual changes, including the effects on the entire distribution and
quantile functions of the outcome as well as on related functionals. These
confidence sets can be used to test functional hypotheses such as no-effect,
positive effect, or stochastic dominance. Our theory applies to general
counterfactual changes and covers the main regression methods including
classical, quantile, duration, and distribution regressions. We illustrate the
results with an empirical application to wage decompositions using data for the
United States.
As a part of developing the main results, we introduce distribution
regression as a comprehensive and flexible tool for modeling and estimating the
\textit{entire} conditional distribution. We show that distribution regression
encompasses the Cox duration regression and represents a useful alternative to
quantile regression. We establish functional central limit theorems and
bootstrap validity results for the empirical distribution regression process
and various related functionals.Comment: 55 pages, 1 table, 3 figures, supplementary appendix with additional
results available from the authors' web site
- …