4,068 research outputs found

    Stein-Rule Estimation under an Extended Balanced Loss Function

    Get PDF
    This paper extends the balanced loss function to a more general set up. The ordinary least squares and Stein-rule estimators are exposed to this general loss function with quadratic loss structure in a linear regression model. Their risks are derived when the disturbances in the linear regression model are not necessarily normally distributed. The dominance of ordinary least squares and Stein-rule estimators over each other and the effect of departure from normality assumption of disturbances on the risk property is studied

    On concentration for (regularized) empirical risk minimization

    Full text link
    Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by~\cite{chatterjee2014new} for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a "direct" argument based on Borell's theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more "indirect" arguments as well as on concentration inequalities for maxima of empirical processes.Comment: 27 page

    Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss

    Full text link
    We study the problem of model selection type aggregation with respect to the Kullback-Leibler divergence for various probabilistic models. Rather than considering a convex combination of the initial estimators f1,
,fNf_1, \ldots, f_N, our aggregation procedures rely on the convex combination of the logarithms of these functions. The first method is designed for probability density estimation as it gives an aggregate estimator that is also a proper density function, whereas the second method concerns spectral density estimation and has no such mass-conserving feature. We select the aggregation weights based on a penalized maximum likelihood criterion. We give sharp oracle inequalities that hold with high probability, with a remainder term that is decomposed into a bias and a variance part. We also show the optimality of the remainder terms by providing the corresponding lower bound results.Comment: 25 page

    Portfolio Diversification and Value at Risk Under Thick-Tailedness

    Get PDF
    We present a unified approach to value at risk analysis under heavy-tailedness using new majorization theory for linear combinations of thick-tailed random variables that we develop. Among other results, we show that the stylized fact that portfolio diversification is always preferable is reversed for extremely heavy-tailed risks or returns. The stylized facts on diversification are nevertheless robust to thick-tailedness of risks or returns as long as their distributions are not extremely long-tailed. We further demonstrate that the value at risk is a coherent measure of risk if distributions of risks are not extremely heavy-tailed. However, coherency of the value at risk is always violated under extreme thick-tailedness. Extensions of the results to the case of dependence, including convolutions of a-symmetric distributions and models with common stochs are provided.

    Lasso type classifiers with a reject option

    Full text link
    We consider the problem of binary classification where one can, for a particular cost, choose not to classify an observation. We present a simple proof for the oracle inequality for the excess risk of structural risk minimizers using a lasso type penalty.Comment: Published at http://dx.doi.org/10.1214/07-EJS058 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Kullback-Leibler aggregation and misspecified generalized linear models

    Full text link
    In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Towards Machine Wald

    Get PDF
    The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page

    On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

    Full text link
    Deep learning has been applied to various tasks in the field of machine learning and has shown superiority to other common procedures such as kernel methods. To provide a better theoretical understanding of the reasons for its success, we discuss the performance of deep learning and other methods on a nonparametric regression problem with a Gaussian noise. Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as H\"{o}lder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice. To highlight the effectiveness of deep learning, we compare deep learning with a class of linear estimators representative of a class of shallow estimators. It is shown that the minimax risk of a linear estimator on the convex hull of a target function class does not differ from that of the original target function class. This results in the suboptimality of linear methods over a simple but non-convex function class, on which deep learning can attain nearly the minimax-optimal rate. In addition to this extreme case, we consider function classes with sparse wavelet coefficients. On these function classes, deep learning also attains the minimax rate up to log factors of the sample size, and linear methods are still suboptimal if the assumed sparsity is strong. We also point out that the parameter sharing of deep neural networks can remarkably reduce the complexity of the model in our setting.Comment: 33 page
    • 

    corecore