2,838 research outputs found
Spatial Weighting Matrix Selection in Spatial Lag Econometric Model
This paper investigates the choice of spatial weighting matrix in a spatial lag model framework. In the empirical literature the choice of spatial weighting matrix has been characterized by a great deal of arbitrariness. The number of possible spatial weighting matrices is large, which until recently was considered to prevent investigation into the appropriateness of the empirical choices. Recently Kostov (2010) proposed a new approach that transforms the problem into an equivalent variable selection problem. This article expands the latter transformation approach into a two-step selection procedure. The proposed approach aims at reducing the arbitrariness in the selection of spatial weighting matrix in spatial econometrics. This allows for a wide range of variable selection methods to be applied to the high dimensional problem of selection of spatial weighting matrix. The suggested approach consists of a screening step that reduces the number of candidate spatial weighting matrices followed by an estimation step selecting the final model. An empirical application of the proposed methodology is presented. In the latter a range of different combinations of screening and estimation methods are employed and found to produce similar results. The proposed methodology is shown to be able to approximate and provide indications to what the ‘true’ spatial weighting matrix could be even when it is not amongst the considered alternatives. The similarity in results obtained using different methods suggests that their relative computational costs could be primary reasons for their choice. Some further extensions and applications are also discussed
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
Understanding and teaching unequal probability of selection
This paper focuses on econometrics pedagogy. It demonstrates the importance of including probability weights in regression analysis using data from surveys that do not use simple random samples (SRS). We use concrete, numerical examples and simulation to show how to effectively teach this difficult material to a student audience. We relax the assumption of simple random sampling and show how unequal probability of selection can lead to biased, inconsistent OLS slope estimates. We then explain and apply probability weighted least squares, showing how weighting the observations by the reciprocal of the probability of inclusion in the sample improves performance. The exposition is non-mathematical and relies heavily on intuitive, visual displays to make the content accessible to students. This paper will enable professors to incorporate unequal probability of selection into their courses and allow students to use best practice techniques in analyzing data from complex surveys. The primary delivery vehicle is Microsoft Excel®. Two user-defined array functions, SAMPLE and LINESTW, are included in a prepared Excel workbook. We replicate all results in Stata® and offer a do file for easy analysis in Stata. Documented code in Excel and Stata allows users to see each step in the sampling and probability weighted least squares algorithms. All files and code are available at www.depauw.edu/learn/stata.unequal probability; complex survey; simulation; weighted regression
Inference on Treatment Effects After Selection Amongst High-Dimensional Controls
We propose robust methods for inference on the effect of a treatment variable
on a scalar outcome in the presence of very many controls. Our setting is a
partially linear model with possibly non-Gaussian and heteroscedastic
disturbances. Our analysis allows the number of controls to be much larger than
the sample size. To make informative inference feasible, we require the model
to be approximately sparse; that is, we require that the effect of confounding
factors can be controlled for up to a small approximation error by conditioning
on a relatively small number of controls whose identities are unknown. The
latter condition makes it possible to estimate the treatment effect by
selecting approximately the right set of controls. We develop a novel
estimation and uniformly valid inference method for the treatment effect in
this setting, called the "post-double-selection" method. Our results apply to
Lasso-type methods used for covariate selection as well as to any other model
selection method that is able to find a sparse model with good approximation
properties.
The main attractive feature of our method is that it allows for imperfect
selection of the controls and provides confidence intervals that are valid
uniformly across a large class of models. In contrast, standard post-model
selection estimators fail to provide uniform inference even in simple cases
with a small, fixed number of controls. Thus our method resolves the problem of
uniform inference after model selection for a large, interesting class of
models. We illustrate the use of the developed methods with numerical
simulations and an application to the effect of abortion on crime rates
GAMLSS for high-dimensional data – a flexible approach based on boosting
Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high-dimensional GAMLSS that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The proposed algorithm was applied to data of the Munich Rental Guide, which is used by
landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net-rent predictions that resulted from the high-dimensional GAMLSS were found to be highly competitive while covariate-specific prediction intervals showed a major improvement over classical GAMs
- …