4,068 research outputs found
Stein-Rule Estimation under an Extended Balanced Loss Function
This paper extends the balanced loss function to a more general set
up. The ordinary least squares and Stein-rule estimators are exposed to
this general loss function with quadratic loss structure in a linear regression
model. Their risks are derived when the disturbances in the linear regression
model are not necessarily normally distributed. The dominance of ordinary
least squares and Stein-rule estimators over each other and the effect of
departure from normality assumption of disturbances on the risk property
is studied
On concentration for (regularized) empirical risk minimization
Rates of convergence for empirical risk minimizers have been well studied in
the literature. In this paper, we aim to provide a complementary set of
results, in particular by showing that after normalization, the risk of the
empirical minimizer concentrates on a single point. Such results have been
established by~\cite{chatterjee2014new} for constrained estimators in the
normal sequence model. We first generalize and sharpen this result to
regularized least squares with convex penalties, making use of a "direct"
argument based on Borell's theorem. We then study generalizations to other loss
functions, including the negative log-likelihood for exponential families
combined with a strictly convex regularization penalty. The results in this
general setting are based on more "indirect" arguments as well as on
concentration inequalities for maxima of empirical processes.Comment: 27 page
Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss
We study the problem of model selection type aggregation with respect to the
Kullback-Leibler divergence for various probabilistic models. Rather than
considering a convex combination of the initial estimators ,
our aggregation procedures rely on the convex combination of the logarithms of
these functions. The first method is designed for probability density
estimation as it gives an aggregate estimator that is also a proper density
function, whereas the second method concerns spectral density estimation and
has no such mass-conserving feature. We select the aggregation weights based on
a penalized maximum likelihood criterion. We give sharp oracle inequalities
that hold with high probability, with a remainder term that is decomposed into
a bias and a variance part. We also show the optimality of the remainder terms
by providing the corresponding lower bound results.Comment: 25 page
Portfolio Diversification and Value at Risk Under Thick-Tailedness
We present a unified approach to value at risk analysis under heavy-tailedness using new majorization theory for linear combinations of thick-tailed random variables that we develop. Among other results, we show that the stylized fact that portfolio diversification is always preferable is reversed for extremely heavy-tailed risks or returns. The stylized facts on diversification are nevertheless robust to thick-tailedness of risks or returns as long as their distributions are not extremely long-tailed. We further demonstrate that the value at risk is a coherent measure of risk if distributions of risks are not extremely heavy-tailed. However, coherency of the value at risk is always violated under extreme thick-tailedness. Extensions of the results to the case of dependence, including convolutions of a-symmetric distributions and models with common stochs are provided.
Lasso type classifiers with a reject option
We consider the problem of binary classification where one can, for a
particular cost, choose not to classify an observation. We present a simple
proof for the oracle inequality for the excess risk of structural risk
minimizers using a lasso type penalty.Comment: Published at http://dx.doi.org/10.1214/07-EJS058 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Kullback-Leibler aggregation and misspecified generalized linear models
In a regression setup with deterministic design, we study the pure
aggregation problem and introduce a natural extension from the Gaussian
distribution to distributions in the exponential family. While this extension
bears strong connections with generalized linear models, it does not require
identifiability of the parameter or even that the model on the systematic
component is true. It is shown that this problem can be solved by constrained
and/or penalized likelihood maximization and we derive sharp oracle
inequalities that hold both in expectation and with high probability. Finally
all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Towards Machine Wald
The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with
limited information. Although computers have made possible the numerical
evaluation of sophisticated statistical models, these models are still designed
\emph{by humans} because there is currently no known recipe or algorithm for
dividing the design of a statistical model into a sequence of arithmetic
operations. Indeed enabling computers to \emph{think} as \emph{humans} have the
ability to do when faced with uncertainty is challenging in several major ways:
(1) Finding optimal statistical models remains to be formulated as a well posed
problem when information on the system of interest is incomplete and comes in
the form of a complex combination of sample data, partial knowledge of
constitutive relations and a limited description of the distribution of input
random variables. (2) The space of admissible scenarios along with the space of
relevant information, assumptions, and/or beliefs, tend to be infinite
dimensional, whereas calculus on a computer is necessarily discrete and finite.
With this purpose, this paper explores the foundations of a rigorous framework
for the scientific computation of optimal statistical estimators/models and
reviews their connections with Decision Theory, Machine Learning, Bayesian
Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty
Quantification and Information Based Complexity.Comment: 37 page
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Deep learning has been applied to various tasks in the field of machine
learning and has shown superiority to other common procedures such as kernel
methods. To provide a better theoretical understanding of the reasons for its
success, we discuss the performance of deep learning and other methods on a
nonparametric regression problem with a Gaussian noise. Whereas existing
theoretical studies of deep learning have been based mainly on mathematical
theories of well-known function classes such as H\"{o}lder and Besov classes,
we focus on function classes with discontinuity and sparsity, which are those
naturally assumed in practice. To highlight the effectiveness of deep learning,
we compare deep learning with a class of linear estimators representative of a
class of shallow estimators. It is shown that the minimax risk of a linear
estimator on the convex hull of a target function class does not differ from
that of the original target function class. This results in the suboptimality
of linear methods over a simple but non-convex function class, on which deep
learning can attain nearly the minimax-optimal rate. In addition to this
extreme case, we consider function classes with sparse wavelet coefficients. On
these function classes, deep learning also attains the minimax rate up to log
factors of the sample size, and linear methods are still suboptimal if the
assumed sparsity is strong. We also point out that the parameter sharing of
deep neural networks can remarkably reduce the complexity of the model in our
setting.Comment: 33 page
- âŠ