101,309 research outputs found
Deriving The GLS Transformation Parameter In Elementary Panel Data Models
The Generalized Least Squares (GLS) transformation that eliminates serial correlation in the error terms is central to a complete understanding of the relationship between the pooled OLS, random effects, and fixed effects estimators. A significant hurdle to attainment of that understanding is the calculation of the parameter that delivers the desired transformation. This paper derives this critical parameter in the benchmark case typically used to introduce these estimators using nothing more than elementary statistics (mean, variance, and covariance) and the quadratic formula
On developing ridge regression parameters : a graphical investigation
In this paper we review some existing and propose some new estimators for estimating the ridge parameter. All in all 19 different estimators have been studied. The investigation has been carried out using Monte Carlo simulations. A large number of different models have been investigated where the variance of the random error, the number of variables included in the model, the correlations among the explanatory variables, the sample size and the unknown coefficient vector were varied. For each model we have performed 2000 replications and presented the results both in term of figures and tables. Based on the simulation study, we found that increasing the number of correlated variable, the variance of the random error and increasing the correlation between the independent variables have negative effect on the mean squared error. When the sample size increases the mean squared error decreases even when the correlation between the independent variables and the variance of the random error are large. In all situations, the proposed estimators have smaller mean squared error than the ordinary least squares and other existing estimators
Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression
Variance estimation is a fundamental problem in statistical modeling. In
ultrahigh dimensional linear regressions where the dimensionality is much
larger than sample size, traditional variance estimation techniques are not
applicable. Recent advances on variable selection in ultrahigh dimensional
linear regressions make this problem accessible. One of the major problems in
ultrahigh dimensional regression is the high spurious correlation between the
unobserved realized noise and some of the predictors. As a result, the realized
noises are actually predicted when extra irrelevant variables are selected,
leading to serious underestimate of the noise level. In this paper, we propose
a two-stage refitted procedure via a data splitting technique, called refitted
cross-validation (RCV), to attenuate the influence of irrelevant variables with
high spurious correlations. Our asymptotic results show that the resulting
procedure performs as well as the oracle estimator, which knows in advance the
mean regression function. The simulation studies lend further support to our
theoretical claims. The naive two-stage estimator which fits the selected
variables in the first stage and the plug-in one stage estimators using LASSO
and SCAD are also studied and compared. Their performances can be improved by
the proposed RCV method
Modelling High Frequency Financial Count Data
This thesis comprises two papers concerning modelling of financial count data. The papers advance the integer-valued moving average model (INMA), a special case of integer-valued autoregressive moving average (INARMA) model class, and apply the models to the number of stock transactions in intra-day data. Paper [1] advances the INMA model to model the number of transactions in stocks in intra-day data. The conditional mean and variance properties are discussed and model extensions to include, e.g., explanatory variables are offered. Least squares and generalized method of moment estimators are presented. In a small Monte Carlo study a feasible least squares estimator comes out as the best choice. Empirically we find support for the use of long-lag moving average models in a Swedish stock series. There is evidence of asymmetric effects of news about prices on the number of transactions. Paper [2] introduces a bivariate integer-valued moving average model (BINMA) and applies the BINMA model to the number of stock transactions in intra-day data. The BINMA model allows for both positive and negative correlations between the count data series. The study shows that the correlation between series in the BINMA model is always smaller than 1 in an absolute sense. The conditional mean, variance and covariance are given. Model extensions to include explanatory variables are suggested. Using the BINMA model for AstraZeneca and Ericsson B it is found that there is positive correlation between the stock transactions series. Empirically, we find support for the use of long-lag bivariate moving average models for the two series.Count data; Intra-day; High frequency; Time series; Estimation; Long memory; Finance
Contribution of ridge type estimators in regression analysis
Regression Analysis is one of (he most widely used statistical techniques for analyzing
multifactor data. Its broad appeal results from the conceptually simple process of using
an equation to express the relationship between a set of variables. Regression analysis is
also interesting theoretically because of the elegant underlying mathematics. Successful
use of regression analysis requires an appreciation of both the theory and the practical
problems (hat often arise when the technique is employed with real world data.
In the model fitting process the most frequently applied and most popular estimation
procedure is the Ordinary Least Square Estimation (OLSE). The significant advantage of
OLSE is that it provides minimum variance unbiased linear estimates for the parameters
in the linear regression model.
In many situations both experimental and non-experimental, the independent variables
tend to be correlated among themselves. Then inter-correlation or multicollinearity among
the independent variables is said to be exist. A variety of interrelated problems are created
when multicollinearity exists. Specially, in the model building process, multicollinearity
among the independent variables causes high variance (if OLSE is used) even though the
estimators are still the minimum variance unbiased estimators in the class of linear unbiased
estimators.
The main objective of this study is to show that the unbiased estimation does not mean
good estimation when the regressors are correlated among themselves or multicollinearity'
exists. Instead, it is tried to motivate the use of biased estimation (Ridge type estimation)
allowing small bias and having a low variance, which together can give a low mean
square error.
This study also reveals the importance of the theoretical results already obtained, and
gives a path for a researcher for the application of the theoretical results in practical
situations.
Keywords: Multicollinearity, Least Square Estimation, Restricted Least SquareSouth Eastern University of Sri Lanka
Oluvil # 32360
Sri Lank
Variance Components Estimation In Mixed Linear Models
This work aim to introduce a new method of estimating the variance components in mixed linear models. The approach will be done firstly for models with three variance components and secondly attention will be devoted to general case of models with an arbitrary number of variance components.
In our approach, we construct and apply a finite sequence of orthogonal transformations, here named sub - diagonalizations, to the covariance structure of the mixed linear model producing a set of Gauss-Markov sub-models which will be used to create pooled estimators for the variance components. Indeed, in order to reduce the bias, we apply the sub - diagonalizations to its correspondent restricted model, that is its projection onto the orthogonal subspace generated by the columns of its mean design matrix. Thus, the Gauss - Markov sub-models will be centered. The produced estimator will be called Sub-D.
Finally, the numerical behavior of the proposed estimator is examined for the case of models with three variance components, comparing its performance to the ones obtained with the REML and ANOVA estimators. Numerical results show that Sub-D produces reasonable and comparable estimates, some times slightly better than those obtained with REML and mostly better than those obtained with ANOVA.
Due to the correlation between the sub-models, the estimated variability of the variability of Sub-D will be slightly bigger than the one of the REML estimator. In attempt to solve this problem a new estimator will be introduced
The shuffle estimator for explainable variance in fMRI experiments
In computational neuroscience, it is important to estimate well the
proportion of signal variance in the total variance of neural activity
measurements. This explainable variance measure helps neuroscientists assess
the adequacy of predictive models that describe how images are encoded in the
brain. Complicating the estimation problem are strong noise correlations, which
may confound the neural responses corresponding to the stimuli. If not properly
taken into account, the correlations could inflate the explainable variance
estimates and suggest false possible prediction accuracies. We propose a novel
method to estimate the explainable variance in functional MRI (fMRI) brain
activity measurements when there are strong correlations in the noise. Our
shuffle estimator is nonparametric, unbiased, and built upon the random effect
model reflecting the randomization in the fMRI data collection process.
Leveraging symmetries in the measurements, our estimator is obtained by
appropriately permuting the measurement vector in such a way that the noise
covariance structure is intact but the explainable variance is changed after
the permutation. This difference is then used to estimate the explainable
variance. We validate the properties of the proposed method in simulation
experiments. For the image-fMRI data, we show that the shuffle estimates can
explain the variation in prediction accuracy for voxels within the primary
visual cortex (V1) better than alternative parametric methods.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS681 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …