10,021 research outputs found

    Modeling Persistent Trends in Distributions

    Full text link
    We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a "trend" in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data.Comment: To appear in: Journal of the American Statistical Associatio

    Extreme Value Statistics of the Total Energy in an Intermediate Complexity Model of the Mid-latitude Atmospheric Jet. Part I: Stationary case

    Get PDF
    A baroclinic model for the atmospheric jet at middle-latitudes is used as a stochastic generator of time series of the total energy of the system. Statistical inference of extreme values is applied to yearly maxima sequences of the time series, in the rigorous setting provided by extreme value theory. In particular, the Generalized Extreme Value (GEV) family of distributions is used here. Several physically realistic values of the parameter TET_E, descriptive of the forced equator-to-pole temperature gradient and responsible for setting the average baroclinicity in the atmospheric model, are examined. The location and scale GEV parameters are found to have a piecewise smooth, monotonically increasing dependence on TET_E. This is in agreement with the similar dependence on TET_E observed in the same system when other dynamically and physically relevant observables are considered. The GEV shape parameter also increases with TET_E but is always negative, as \textit{a priori} required by the boundedness of the total energy of the system. The sensitivity of the statistical inference process is studied with respect to the selection procedure of the maxima: the roles of both the length of maxima sequences and of the length of data blocks over which the maxima are computed are critically analyzed. Issues related to model sensitivity are also explored by varying the resolution of the system

    Extreme Value GARCH modelling with Bayesian Inference

    Get PDF
    Extreme value theory is widely used financial applications such as risk analysis, forecasting and pricing models. One of the major difficulties in the applications to finance and economics is that the assumption of independence of time series observations is generally not satisfied, so that the dependent extremes may not necessarily be in the domain of attraction of the classical generalised extreme value distribution. This study examines a conditional extreme value distribution with the added specification that the extreme values (maxima or minima) follows a conditional autoregressive heteroscedasticity process. The dependence has been modelled by allowing the location and scale parameters of the extreme distribution to vary with time. The resulting combined model, GEV-GARCH, is developed by implementing the GARCH volatility mechanism in these extreme value model parameters. Bayesian inference is used for the estimation of parameters and posterior inference is available through the Markov Chain Monte Carlo (MCMC) method. The model is firstly applied to relevant simulated data to verify model stability and reliability of the parameter estimation method. Then real stock returns are used to consider evidence for the appropriate application of the model. A comparison is made between the GEV-GARCH and traditional GARCH models. Both the GEV-GARCH and GARCH show similarity in the resulting conditional volatility estimates, however the GEV-GARCH model differs from GARCH in that it can capture and explain extreme quantiles better than the GARCH model because of more reliable extrapolation of the tail behaviour.Extreme value distribution, dependency, Bayesian, MCMC, Return quantile

    Estimation of Extreme Quantiles for Functions of Dependent Random Variables

    Get PDF
    We propose a new method for estimating the extreme quantiles for a function of several dependent random variables. In contrast to the conventional approach based on extreme value theory, we do not impose the condition that the tail of the underlying distribution admits an approximate parametric form, and, furthermore, our estimation makes use of the full observed data. The proposed method is semiparametric as no parametric forms are assumed on all the marginal distributions. But we select appropriate bivariate copulas to model the joint dependence structure by taking the advantage of the recent development in constructing large dimensional vine copulas. Consequently a sample quantile resulted from a large bootstrap sample drawn from the fitted joint distribution is taken as the estimator for the extreme quantile. This estimator is proved to be consistent. The reliable and robust performance of the proposed method is further illustrated by simulation.Comment: 18 pages, 2 figure

    Fixed Effect Estimation of Large T Panel Data Models

    Get PDF
    This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.Comment: 40 pages, 1 tabl

    Quantile Regression in Risk Calibration

    Get PDF
    Financial risk control has always been challenging and becomes now an even harder problem as joint extreme events occur more frequently. For decision makers and government regulators, it is therefore important to obtain accurate information on the interdependency of risk factors. Given a stressful situation for one market participant, one likes to measure how this stress affects other factors. The CoVaR (Conditional VaR) framework has been developed for this purpose. The basic technical elements of CoVaR estimation are two levels of quantile regression: one on market risk factors; another on individual risk factor. Tests on the functional form of the two-level quantile regression reject the linearity. A flexible semiparametric modeling framework for CoVaR is proposed. A partial linear model (PLM) is analyzed. In applying the technology to stock data covering the crisis period, the PLM outperforms in the crisis time, with the justification of the backtesting procedures. Moreover, using the data on global stock markets indices, the analysis on marginal contribution of risk (MCR) defined as the local first order derivative of the quantile curve sheds some light on the source of the global market risk.CoVaR, Value-at-Risk, quantile regression, locally linear quantile regression, partial linear model, semiparametric model

    Local bilinear multiple-output quantile/depth regression

    Full text link
    A new quantile regression concept, based on a directional version of Koenker and Bassett's traditional single-output one, has been introduced in [Ann. Statist. (2010) 38 635-669] for multiple-output location/linear regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear (actually, bilinear) versions of those contours, which both allow to asymptotically recover the conditional halfspace depth contours that completely characterize the response's conditional distributions. Bahadur representation and asymptotic normality results are established. Illustrations are provided both on simulated and real data.Comment: Published at http://dx.doi.org/10.3150/14-BEJ610 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Efficient semiparametric estimation of a partially linear quantile regression model

    Get PDF
    This paper is concerned with estimating a conditional quantile function that is assumed to be partially linear. The paper develops a simple estimator of the parametric component of the conditional quantile. The semiparametric efficiency bound for the parametric component is derived, and two types of efficient estimators are considered. Asymptotic properties of the proposed estimators are established under regularity conditions. Some Monte Carlo experiments indicate that the proposed estimators perform well in small samples
    corecore