35 research outputs found

    Specification analysis of linear quantile models

    Get PDF
    This paper introduces a nonparametric test for the correct specification of a linear conditional quantile function over a continuum of quantile levels. These tests may be applied to assess the validity of post-estimation inferences regarding the effect of conditioning variables on the distribution of outcomes. We show that the use of an orthogonal projection on the tangent space of nuisance parameters at each quantile index both improves power and facilitates the simulation of critical values via the application of a simple multiplier bootstrap procedure. Monte Carlo evidence and an application to the empirical analysis of age-earnings curves are included.Escanciano acknowledges the support of the Spanish Plan Nacional de I+D+I, reference number SEJ2007-62908

    A Test of the Conditional Independence Assumption in Sample Selection Models

    Get PDF
    Identification in most sample selection models depends on the independence of the regressors and the error terms conditional on the selection probability. All quantile and mean functions are parallel in these models; this implies that quantile estimators cannot reveal any—per assumption non-existing—heterogeneity. Quantile estimators are nevertheless useful for testing the conditional independence assumption because they are consistent under the null hypothesis. We propose tests of the Kolmogorov– Smirnov type based on the conditional quantile regression process. Monte Carlo simulations show that their size is satisfactory and their power sufficient to detect deviations under plausible data-generating processes. We apply our procedures to female wage data from the 2011 Current Population Survey and show that homogeneity is clearly rejected. Copyright © 2014 John Wiley & Sons, Ltd

    Towards Improving Drought Forecasts Across Different Spatial and Temporal Scales

    Get PDF
    Recent water scarcities across the southwestern U.S. with severe effects on the living environment inspire the development of new methodologies to achieve reliable drought forecasting in seasonal scale. Reliable forecast of hydrologic variables, in general, is a preliminary requirement for appropriate planning of water resources and developing effective allocation policies. This study aims at developing new techniques with specific probabilistic features to improve the reliability of hydrologic forecasts, particularly the drought forecasts. The drought status in the future is determined by certain hydrologic variables that are basically estimated by the hydrologic models with rather simple to complex structures. Since the predictions of hydrologic models are prone to different sources of uncertainties, there have been several techniques examined during past several years which generally attempt to combine the predictions of single (multiple) hydrologic models to generate an ensemble of hydrologic forecasts addressing the inherent uncertainties. However, the imperfect structure of hydrologic models usually lead to systematic bias of hydrologic predictions that further appears in the forecast ensembles. This study proposes a post-processing method that is applied to the raw forecast of hydrologic variables and can develop the entire distribution of forecast around the initial single-value prediction. To establish the probability density function (PDF) of the forecast, a group of multivariate distribution functions, the so-called copula functions, are incorporated in the post-processing procedure. The performance of the new post-processing technique is tested on 2500 hypothetical case studies and the streamflow forecast of Sprague River Basin in southern Oregon. Verified by some deterministic and probabilistic verification measures, the method of Quantile Mapping as a traditional post-processing technique cannot generate the qualified forecasts as comparing with the copula-based method. The post-processing technique is then expanded to exclusively study the drought forecasts across the different spatial and temporal scales. In the proposed drought forecasting model, the drought status in the future is evaluated based on the drought status of the past seasons while the correlations between the drought variables of consecutive seasons are preserved by copula functions. The main benefit of the new forecast model is its probabilistic features in analyzing future droughts. It develops conditional probability of drought status in the forecast season and generates the PDF and cumulative distribution function (CDF) of future droughts given the past status. The conditional PDF can return the highest probable drought in the future along with an assessment of the uncertainty around that value. Using the conditional CDF for forecast season, the model can generate the maps of drought status across the basin with particular chance of occurrence in the future. In a different analysis of the conditional CDF developed for the forecast season, the chance of a particular drought in the forecast period can be approximated given the drought status of earlier seasons. The forecast methodology developed in this study shows promising results in hydrologic forecasts and its particular probabilistic features are inspiring for future studies

    Testing Statistical Hypotheses for Latent Variable Models and Some Computational Issues

    Get PDF
    In this dissertation, I address unorthodox statistical problems concerning goodness-of-fit tests in the latent variable context and efficient statistical computations. In epidemiological and biomedical studies observations with measurement errors are quite common, especially when it is difficult to calibrate true signals accurately. In this first problem, I develop a statistical test for testing equality of two distributions when the observed contaminated data follow the classical additive measurement error model. The fact is that the two-sample homogeneity tests, such as Kolmogorov-Smirnov, Anderson-Darling, or von Mises test, are not consistent when observations are subject to measurement error. To develop a consistent test, first the characteristic functions of unobservable true random variables are estimated from the contaminated data, and then the test statistic is defined as the integrated difference between the two estimated characteristic functions. It is shown that when the sample size is large and the null hypothesis holds, the test statistic converges to an integral of a squared Gaussian process. However, enumeration of this distribution to obtain the rejection region is not simple. Therefore, I propose a bootstrap approach to compute the p-value of the test statistic. The operating characteristics of the proposed test is assessed and compared with the other approaches via extensive simulation studies. The proposed method is then applied to analyze the National Health and Nutrition Examination Survey (NHANES) dataset. Although researchers considered estimation of the regression parameters in the presence of exposure measurement error, this testing problem is completely new and no one has considered it before. In the next problem, I consider the stochastic frontier model (SFM) which is a widely used model for measuring firms’ efficiency. In productivity or cost studies in the field of econometrics, there is a discrepancy between the theoretically optimal product and the actual output for a certain amount of inputs and this gap is called technical inefficiency. To assess this inefficiency, the stochastic frontier model is in use to include this gap as a latent variable in addition to the usual statistical noise. Since it is unable to observe this gap, estimation and inference depend on the distributional assumption of the technical inefficiency term. Usually, an exponential or half-normal distribution is widely assumed for the inefficiency term. In that sense, I develop a Bayesian test for testing whether this parametric assumption is correct. I construct a broad semiparametric family which approximate or contain the true distribution as an alternative and then define a Bayes factor. I show the Bayes factor consistency under certain conditions and present the finite sample performance via Monte-Carlo simulations. The second part of my dissertation is about statistical computational problems. Frequentist standard errors are of interest to evaluate uncertainty of an estimator and utilized for many statistical inference problems. In this dissertation, I consider standard error calculation for Bayes estimators. Except some hypothetical scenarios, estimating frequentist variability of any estimator possibly involves bootstrapping to approximate the sampling distribution of the estimator. In addition, for a Bayesian modeling combined with Markov chain Monte Carlo (MCMC) and bootstrap the computation of the standard error of Bayes estimator is computationally expensive and impractical. Specifically, repeated application of the MCMC on each of the bootstrapped data make everything computationally inefficient. To overcome this difficulty, I propose a clever use of the importance sampling technique to reduce the computational burden. I apply this proposed technique to several examples including logistic regression, linear measurement error model, Weibull regression model and vector autoregressive model. In the second computational problem, I explore the binary regression with flexible skew-probit link function which contains traditional probit link function as a special case. The skew-probit model is useful for modelling success probability of binary response or count data where the success probability is not a symmetric function of continuous regressors. In this topic, I investigate the parameter identifiability of skew-probit model. I then demonstrate that the maximum likelihood estimator (MLE) of the skewness parameter is highly biased. I develop a penalized likelihood approach based on three penalty functions to reduce the finite sample bias of the MLE of the skew-probit model. The performances of each penalized MLE are compared through extensive simulations and I analyze the heart-disease data using the proposed approaches

    Testing Monotonicity in Unobservables with Panel Data

    Get PDF
    Monotonicity in a scalar unobservable is a crucial identifying assumption for an important class of nonparametric structural models accommodating unobserved heterogeneity. Tests for this monotonic-ity have previously been unavailable. This paper proposes and analyzes tests for scalar monotonicity using panel data for structures with and without time-varying unobservables, either partially or fully nonseparable between observables and unobservables. Our nonparametric tests are computationally straightforward, have well behaved limiting distributions under the null, are consistent against pre-cisely specified alternatives, and have standard local power properties. We provide straightforward bootstrap methods for inference. Some Monte Carlo experiments show that, for empirically relevant sample sizes, these reasonably control the level of the test, and that our tests have useful power. We apply our tests to study asset returns and demand for ready-to-eat cereals

    Tests estadísticos basados en proyecciones aleatorias

    Get PDF
    RESUMEN: Las proyecciones aleatorias proyectan los datos iniciales de alta dimensión en un subespacio de baja dimensión seleccionado aleatoriamente. Se usan en problemas que requieren el manejo de datos de dimensión reducida junto con eficiencia computacional y preservación de la estructura local de los datos. Se utilizan según dos paradigmas: se elige un estadístico apropiado para el problema considerado en el caso unidimensional y i) Se maneja un número reducido de proyecciones unidimensionales en las que, para cada una de ellas, se calcula el valor del estadístico. Se elige un valor que resuma los valores obtenidos. ii) Se calcula el valor esperado, dada la muestra, del estadístico. En esta tesis usamos i) para proponer un nuevo procedimiento de detección de outliers en dimensión alta (ayudándonos del análisis secuencial) y ii) para introducir una novedosa familia de tests de uniformidad en hiperesferas. Estudios de simulación corroboran las propiedades teóricas obtenidas. La aplicación a conjuntos de datos reales ilustra el funcionamiento de los métodos propuestos.ABSTRACT: Random projections project high-dimensional data into a lower dimensional subspace that has been randomly chosen. They are used in problems that require handling reduced dimensional data in a computational efficiency manner while preserving the local structure of the original high-dimensional data. They are applied according to two paradigms: choosing an appropiate statistic for the considered problem in the one-dimensional case and i) Handling a reduced number of one-dimensional random projections in which, for each of them, the value of the statistic is computed. Choosing a value summarizing the obtained values. ii) Computing the expected value, given the sample, of this statistic. In this thesis we use i) to propose a new procedure that detects outliers in Gaussian high-dimensional data (by means of sequential analysis) and ii) to introduce a novel projection-based class of uniformity tests on the hypersphere. Simulation studies corroborate our theoretical findings and the application to real datasets illustrates the performance of the proposed methods

    STK /WST 795 Research Reports

    Get PDF
    These documents contain the honours research reports for each year for the Department of Statistics.Honours Research Reports - University of Pretoria 20XXStatisticsBSs (Hons) Mathematical Statistics, BCom (Hons) Statistics, BCom (Hons) Mathematical StatisticsUnrestricte

    Prediction of nonlinear nonstationary time series data using a digital filter and support vector regression

    No full text
    Volatility is a key parameter when measuring the size of the errors made in modelling returns and other nonlinear nonstationary time series data. The Autoregressive Integrated Moving- Average (ARIMA) model is a linear process in time series; whilst in the nonlinear system, the Generalised Autoregressive Conditional Heteroskedasticity (GARCH) and Markov Switching GARCH (MS-GARCH) models have been widely applied. In statistical learning theory, Support Vector Regression (SVR) plays an important role in predicting nonlinear and nonstationary time series data. We propose a new class model comprised of a combination of a novel derivative Empirical Mode Decomposition (EMD), averaging intrinsic mode function (aIMF) and a novel of multiclass SVR using mean reversion and coefficient of variance (CV) to predict financial data i.e. EUR-USD exchange rates. The proposed novel aIMF is capable of smoothing and reducing noise, whereas the novel of multiclass SVR model can predict exchange rates. Our simulation results show that our model significantly outperforms simulations by state-of-art ARIMA, GARCH, Markov Switching generalised Autoregressive conditional Heteroskedasticity (MS-GARCH), Markov Switching Regression (MSR) models and Markov chain Monte Carlo (MCMC) regression.Open Acces
    corecore