455 research outputs found

    Isotonic distributional regression

    Get PDF
    Isotonic distributional regression (IDR) is a powerful non-parametric technique for the estimation of conditional distributions under order restrictions. In a nutshell, IDR learns conditional distributions that are calibrated, and simultaneously optimal relative to comprehensive classes of relevant loss functions, subject to isotonicity constraints in terms of a partial order on the covariate space. Non-parametric isotonic quantile regression and non-parametric isotonic binary regression emerge as special cases. For prediction, we propose an interpolation method that generalizes extant specifications under the pool adjacent violators algorithm. We recommend the use of IDR as a generic benchmark technique in probabilistic forecast problems, as it does not involve any parameter tuning nor implementation choices, except for the selection of a partial order on the covariate space. The method can be combined with subsample aggregation, with the benefits of smoother regression functions and gains in computational efficiency. In a simulation study, we compare methods for distributional regression in terms of the continuous ranked probability score (CRPS) and 2 estimation error, which are closely linked. In a case study on raw and post-processed quantitative precipitation forecasts from a leading numerical weather prediction system, IDR is competitive with state of the art techniques

    Isotonic Distributional Regression

    Get PDF
    Distributional regression estimates the probability distribution of a response variable conditional on covariates. The estimated conditional distribution comprehensively summarizes the available information on the response variable, and allows to derive all statistical quantities of interest, such as the conditional mean, threshold exceedance probabilities, or quantiles. This thesis develops isotonic distributional regression, a method for estimating conditional distributions under the assumption of a monotone relationship between covariates and a response variable. The response variable is univariate and real-valued, and the covariates lie in a partially ordered set. The monotone relationship is formulated in terms of stochastic order constraints, that is, the response variable increases in a stochastic sense as the covariates increase in the partial order. This assumption alone yields a shape-constrained non-parametric estimator, which does not involve any tuning parameters. The estimation of distributions under stochastic order restrictions has already been studied for various stochastic orders, but so far only with totally ordered covariates. Apart from considering more general partially ordered covariates, the first main contribution of this thesis lies in a shift of focus from estimation to prediction. Distributional regression is the backbone of probabilistic forecasting, which aims at quantifying the uncertainty about a future quantity of interest comprehensively in the form of probability distributions. When analyzed with respect to predominant criteria for probabilistic forecast quality, isotonic distributional regression is shown to have desirable properties. In addition, this thesis develops an efficient algorithm for the computation of isotonic distributional regression, and proposes an estimator under a weaker, previously not thoroughly studied stochastic order constraint. A main application of isotonic distributional regression is the uncertainty quantification for point forecasts. Such point forecasts sometimes stem from external sources, like physical models or expert surveys, but often they are generated with statistical models. The second contribution of this thesis is the extension of isotonic distributional regression to allow covariates that are point predictions from a regression model, which may be trained on the same data to which isotonic distributional regression is to be applied. This combination yields a so-called distributional index model. Asymptotic consistency is proved under suitable assumptions, and real data applications demonstrate the usefulness of the method. Isotonic distributional regression provides a benchmark in forecasting problems, as it allows to quantify the merits of a specific, tailored model for the application at hand over a generic method which only relies on monotonicity. In such comparisons it is vital to assess the significance of forecast superiority or of forecast misspecification. The third contribution of this thesis is the development of new, safe methods for forecast evaluation, which require no or minimal assumptions on the data generating processes

    Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

    Full text link
    Random Forests (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data, which can also be used for targets other than the original mean estimation. We propose a novel forest construction for multivariate responses based on their joint conditional distribution, independent of the estimation target and the data model. It uses a new splitting criterion based on the MMD distributional metric, which is suitable for detecting heterogeneity in multivariate distributions. The induced weights define an estimate of the full conditional distribution, which in turn can be used for arbitrary and potentially complicated targets of interest. The method is very versatile and convenient to use, as we illustrate on a wide range of examples. The code is available as Python and R packages drf

    Distributional Regression for Data Analysis

    Full text link
    Flexible modeling of how an entire distribution changes with covariates is an important yet challenging generalization of mean-based regression that has seen growing interest over the past decades in both the statistics and machine learning literature. This review outlines selected state-of-the-art statistical approaches to distributional regression, complemented with alternatives from machine learning. Topics covered include the similarities and differences between these approaches, extensions, properties and limitations, estimation procedures, and the availability of software. In view of the increasing complexity and availability of large-scale data, this review also discusses the scalability of traditional estimation methods, current trends, and open challenges. Illustrations are provided using data on childhood malnutrition in Nigeria and Australian electricity prices.Comment: Accepted for publication in Annual Review of Statistics and its Applicatio

    GAMLSS: a distributional regression approach

    Get PDF
    A tutorial of the generalized additive models for location, scale and shape (GAMLSS) is given here using two examples. GAMLSS is a general framework for performing regression analysis where not only the location (e.g., the mean) of the distribution but also the scale and shape of the distribution can be modelled by explanatory variables

    Isotonic Distributional Regression

    Get PDF
    Isotonic distributional regression (IDR) is a powerful nonparametric technique for the estimation of conditional distributions under order restrictions. In a nutshell, IDR learns conditional distributions that are calibrated, and simultaneously optimal relative to comprehensive classes of relevant loss functions, subject to isotonicity constraints in terms of a partial order on the covariate space. Nonparametric isotonic quantile regression and nonparametric isotonic binary regression emerge as special cases. For prediction, we propose an interpolation method that generalizes extant specifications under the pool adjacent violators algorithm. We recommend the use of IDR as a generic benchmark technique in probabilistic forecast problems, as it does not involve any parameter tuning nor implementation choices, except for the selection of a partial order on the covariate space. The method can be combined with subsample aggregation, with the benefits of smoother regression functions and gains in computational efficiency. In a simulation study, we compare methods for distributional regression in terms of the continuous ranked probability score (CRPS) and L2L_2 estimation error, which are closely linked. In a case study on raw and postprocessed quantitative precipitation forecasts from a leading numerical weather prediction system, IDR is competitive with state of the art techniques

    Contributions to distributional regression models. Applications in biomedicine

    Get PDF
    The present thesis makes statistical contributions in the field of frequentist and Bayesian distributional regression (DR) for univariate and bivariate responses. This work also proposes the inclusion of functional data into the DR models. The proposed methodologies are applied to several real biomedical studies, with emphasis in diabetes research

    Bayesian Measurement Error Correction in Structured Additive Distributional Regression with an Application to the Analysis of Sensor Data on Soil-Plant Variability

    Full text link
    The flexibility of the Bayesian approach to account for covariates with measurement error is combined with semiparametric regression models for a class of continuous, discrete and mixed univariate response distributions with potentially all parameters depending on a structured additive predictor. Markov chain Monte Carlo enables a modular and numerically efficient implementation of Bayesian measurement error correction based on the imputation of unobserved error-free covariate values. We allow for very general measurement errors, including correlated replicates with heterogeneous variances. The proposal is first assessed by a simulation trial, then it is applied to the assessment of a soil-plant relationship crucial for implementing efficient agricultural management practices. Observations on multi-depth soil information forage ground-cover for a seven hectares Alfalfa stand in South Italy were obtained using sensors with very refined spatial resolution. Estimating a functional relation between ground-cover and soil with these data involves addressing issues linked to the spatial and temporal misalignment and the large data size. We propose a preliminary spatial interpolation on a lattice covering the field and subsequent analysis by a structured additive distributional regression model accounting for measurement error in the soil covariate. Results are interpreted and commented in connection to possible Alfalfa management strategies
    • …
    corecore