101,309 research outputs found

    Deriving The GLS Transformation Parameter In Elementary Panel Data Models

    Get PDF
    The Generalized Least Squares (GLS) transformation that eliminates serial correlation in the error terms is central to a complete understanding of the relationship between the pooled OLS, random effects, and fixed effects estimators. A significant hurdle to attainment of that understanding is the calculation of the parameter that delivers the desired transformation. This paper derives this critical parameter in the benchmark case typically used to introduce these estimators using nothing more than elementary statistics (mean, variance, and covariance) and the quadratic formula

    On developing ridge regression parameters : a graphical investigation

    Get PDF
    In this paper we review some existing and propose some new estimators for estimating the ridge parameter. All in all 19 different estimators have been studied. The investigation has been carried out using Monte Carlo simulations. A large number of different models have been investigated where the variance of the random error, the number of variables included in the model, the correlations among the explanatory variables, the sample size and the unknown coefficient vector were varied. For each model we have performed 2000 replications and presented the results both in term of figures and tables. Based on the simulation study, we found that increasing the number of correlated variable, the variance of the random error and increasing the correlation between the independent variables have negative effect on the mean squared error. When the sample size increases the mean squared error decreases even when the correlation between the independent variables and the variance of the random error are large. In all situations, the proposed estimators have smaller mean squared error than the ordinary least squares and other existing estimators

    Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression

    Full text link
    Variance estimation is a fundamental problem in statistical modeling. In ultrahigh dimensional linear regressions where the dimensionality is much larger than sample size, traditional variance estimation techniques are not applicable. Recent advances on variable selection in ultrahigh dimensional linear regressions make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to serious underestimate of the noise level. In this paper, we propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation (RCV), to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator which fits the selected variables in the first stage and the plug-in one stage estimators using LASSO and SCAD are also studied and compared. Their performances can be improved by the proposed RCV method

    Modelling High Frequency Financial Count Data

    Get PDF
    This thesis comprises two papers concerning modelling of financial count data. The papers advance the integer-valued moving average model (INMA), a special case of integer-valued autoregressive moving average (INARMA) model class, and apply the models to the number of stock transactions in intra-day data. Paper [1] advances the INMA model to model the number of transactions in stocks in intra-day data. The conditional mean and variance properties are discussed and model extensions to include, e.g., explanatory variables are offered. Least squares and generalized method of moment estimators are presented. In a small Monte Carlo study a feasible least squares estimator comes out as the best choice. Empirically we find support for the use of long-lag moving average models in a Swedish stock series. There is evidence of asymmetric effects of news about prices on the number of transactions. Paper [2] introduces a bivariate integer-valued moving average model (BINMA) and applies the BINMA model to the number of stock transactions in intra-day data. The BINMA model allows for both positive and negative correlations between the count data series. The study shows that the correlation between series in the BINMA model is always smaller than 1 in an absolute sense. The conditional mean, variance and covariance are given. Model extensions to include explanatory variables are suggested. Using the BINMA model for AstraZeneca and Ericsson B it is found that there is positive correlation between the stock transactions series. Empirically, we find support for the use of long-lag bivariate moving average models for the two series.Count data; Intra-day; High frequency; Time series; Estimation; Long memory; Finance

    Contribution of ridge type estimators in regression analysis

    Get PDF
    Regression Analysis is one of (he most widely used statistical techniques for analyzing multifactor data. Its broad appeal results from the conceptually simple process of using an equation to express the relationship between a set of variables. Regression analysis is also interesting theoretically because of the elegant underlying mathematics. Successful use of regression analysis requires an appreciation of both the theory and the practical problems (hat often arise when the technique is employed with real world data. In the model fitting process the most frequently applied and most popular estimation procedure is the Ordinary Least Square Estimation (OLSE). The significant advantage of OLSE is that it provides minimum variance unbiased linear estimates for the parameters in the linear regression model. In many situations both experimental and non-experimental, the independent variables tend to be correlated among themselves. Then inter-correlation or multicollinearity among the independent variables is said to be exist. A variety of interrelated problems are created when multicollinearity exists. Specially, in the model building process, multicollinearity among the independent variables causes high variance (if OLSE is used) even though the estimators are still the minimum variance unbiased estimators in the class of linear unbiased estimators. The main objective of this study is to show that the unbiased estimation does not mean good estimation when the regressors are correlated among themselves or multicollinearity' exists. Instead, it is tried to motivate the use of biased estimation (Ridge type estimation) allowing small bias and having a low variance, which together can give a low mean square error. This study also reveals the importance of the theoretical results already obtained, and gives a path for a researcher for the application of the theoretical results in practical situations. Keywords: Multicollinearity, Least Square Estimation, Restricted Least SquareSouth Eastern University of Sri Lanka Oluvil # 32360 Sri Lank

    Variance Components Estimation In Mixed Linear Models

    Get PDF
    This work aim to introduce a new method of estimating the variance components in mixed linear models. The approach will be done firstly for models with three variance components and secondly attention will be devoted to general case of models with an arbitrary number of variance components. In our approach, we construct and apply a finite sequence of orthogonal transformations, here named sub - diagonalizations, to the covariance structure of the mixed linear model producing a set of Gauss-Markov sub-models which will be used to create pooled estimators for the variance components. Indeed, in order to reduce the bias, we apply the sub - diagonalizations to its correspondent restricted model, that is its projection onto the orthogonal subspace generated by the columns of its mean design matrix. Thus, the Gauss - Markov sub-models will be centered. The produced estimator will be called Sub-D. Finally, the numerical behavior of the proposed estimator is examined for the case of models with three variance components, comparing its performance to the ones obtained with the REML and ANOVA estimators. Numerical results show that Sub-D produces reasonable and comparable estimates, some times slightly better than those obtained with REML and mostly better than those obtained with ANOVA. Due to the correlation between the sub-models, the estimated variability of the variability of Sub-D will be slightly bigger than the one of the REML estimator. In attempt to solve this problem a new estimator will be introduced

    The shuffle estimator for explainable variance in fMRI experiments

    Full text link
    In computational neuroscience, it is important to estimate well the proportion of signal variance in the total variance of neural activity measurements. This explainable variance measure helps neuroscientists assess the adequacy of predictive models that describe how images are encoded in the brain. Complicating the estimation problem are strong noise correlations, which may confound the neural responses corresponding to the stimuli. If not properly taken into account, the correlations could inflate the explainable variance estimates and suggest false possible prediction accuracies. We propose a novel method to estimate the explainable variance in functional MRI (fMRI) brain activity measurements when there are strong correlations in the noise. Our shuffle estimator is nonparametric, unbiased, and built upon the random effect model reflecting the randomization in the fMRI data collection process. Leveraging symmetries in the measurements, our estimator is obtained by appropriately permuting the measurement vector in such a way that the noise covariance structure is intact but the explainable variance is changed after the permutation. This difference is then used to estimate the explainable variance. We validate the properties of the proposed method in simulation experiments. For the image-fMRI data, we show that the shuffle estimates can explain the variation in prediction accuracy for voxels within the primary visual cortex (V1) better than alternative parametric methods.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS681 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore