128,883 research outputs found
Optimal inference in a class of regression models
We consider the problem of constructing confidence intervals (CIs) for a
linear functional of a regression function, such as its value at a point, the
regression discontinuity parameter, or a regression coefficient in a linear or
partly linear regression. Our main assumption is that the regression function
is known to lie in a convex function class, which covers most smoothness and/or
shape assumptions used in econometrics. We derive finite-sample optimal CIs and
sharp efficiency bounds under normal errors with known variance. We show that
these results translate to uniform (over the function class) asymptotic results
when the error distribution is not known. When the function class is
centrosymmetric, these efficiency bounds imply that minimax CIs are close to
efficient at smooth regression functions. This implies, in particular, that it
is impossible to form CIs that are tighter using data-dependent tuning
parameters, and maintain coverage over the whole function class. We specialize
our results to inference on the regression discontinuity parameter, and
illustrate them in simulations and an empirical application.Comment: 39 pages plus supplementary material
Selection of tuning parameters in bridge regression models via Bayesian information criterion
We consider the bridge linear regression modeling, which can produce a sparse
or non-sparse model. A crucial point in the model building process is the
selection of adjusted parameters including a regularization parameter and a
tuning parameter in bridge regression models. The choice of the adjusted
parameters can be viewed as a model selection and evaluation problem. We
propose a model selection criterion for evaluating bridge regression models in
terms of Bayesian approach. This selection criterion enables us to select the
adjusted parameters objectively. We investigate the effectiveness of our
proposed modeling strategy through some numerical examples.Comment: 20 pages, 5 figure
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
The rising volume of datasets has made training machine learning (ML) models
a major computational cost in the enterprise. Given the iterative nature of
model and parameter tuning, many analysts use a small sample of their entire
data during their initial stage of analysis to make quick decisions (e.g., what
features or hyperparameters to use) and use the entire dataset only in later
stages (i.e., when they have converged to a specific model). This sampling,
however, is performed in an ad-hoc fashion. Most practitioners cannot precisely
capture the effect of sampling on the quality of their model, and eventually on
their decision-making process during the tuning phase. Moreover, without
systematic support for sampling operators, many optimizations and reuse
opportunities are lost.
In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML
training. BlinkML allows users to make error-computation tradeoffs: instead of
training a model on their full data (i.e., full model), BlinkML can quickly
train an approximate model with quality guarantees using a sample. The quality
guarantees ensure that, with high probability, the approximate model makes the
same predictions as the full model. BlinkML currently supports any ML model
that relies on maximum likelihood estimation (MLE), which includes Generalized
Linear Models (e.g., linear regression, logistic regression, max entropy
classifier, Poisson regression) as well as PPCA (Probabilistic Principal
Component Analysis). Our experiments show that BlinkML can speed up the
training of large-scale ML tasks by 6.26x-629x while guaranteeing the same
predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201
- …