Search CORE

31 research outputs found

Subset hypotheses testing and instrument exclusion in the linear IV regression

Author: Firmin Doko Tchatoka
Publication venue
Publication date
Field of study

This paper investigates the asymptotic size properties of robust subset tests when instruments are left out of the analysis. Recently, robust subset procedures have been developed for testing hypotheses which are specified on the subsets of the structural parameters or on the parameters associated with the included exogenous variables. It has been shown that they never over-reject the true parameter values even when nuisance parameters are not identified. However, their robustness to instrument exclusion has not been investigated. Instrument exclusion is an important problem in econometrics and there are at least two reasons to be concerned. Firstly, it is difficult in practice to assess whether an instrument has been omitted. For example, some components of the “identifying” instruments that are excluded from the structural equation may be quite uncertain or “left out” of the analysis. Secondly, in many instrumental variable (IV) applications, an infinite number of instruments are available for use in large sample estimation. This is particularly the case with most time series models. If a given variable, say Xt, is a legitimate instrument, so too are its lags Xt1; Xt2. Hence, instrument exclusion seems highly likely in most practical situations. After formulating a general asymptotic framework which allows one to study this issue in a convenient way, I consider two main setups: (1) the missing instruments are (possibly) relevant, and, (2) they are asymptotically weak. In both setups, I show that all subset procedures studied are in general consistent against instrument inclusion (hence asymptotically invalid for the subset hypothesis of interest). I characterize cases where consistency may not hold, but the asymptotic distribution is modified in a way that would lead to size distortions in large samples. I propose a “rule of thumb” which allows to practitioners to know whether a missing instrument is detrimental or not to subset procedures. I present a Monte Carlo experiment confirming that the subset procedures are unreliable when instruments are missing.REPEC,

Research Papers in Economics

Uniform Inference after Pretesting for Exogeneity

Author: Doko Tchatoka Firmin
Wang Wenjie
Publication venue
Publication date: 24/03/2020
Field of study

Pretesting for exogeneity has become a routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares (OLS) or the two-stage least squares (2SLS) method is appropriate. Guggenberger (2010) shows that the second-stage t-test– based on the outcome of a Durbin- Wu-Hausman type pretest for exogeneity in the first-stage– has extreme size distortion with asymptotic size equal to 1 when the standard asymptotic critical values are used. In this paper, we first show that the standard residual bootstrap procedures (with either independent or dependent draws of disturbances) are not viable solutions to such extreme size-distortion problem. Then, we propose a novel hybrid bootstrap approach, which combines the residual-based bootstrap along with an adjusted Bonferroni size-correction method. We establish uniform validity of this hybrid bootstrap in the sense that it yields a two-stage test with correct asymptotic size. Monte Carlo simulations confirm our theoretical findings. In particular, our proposed hybrid method achieves remarkable power gains over the 2SLS-based t-test, especially when IVs are not very strong

Uniform Inference after Pretesting for Exogeneity

Author: Doko Tchatoka Firmin
Wang Wenjie
Publication venue
Publication date: 24/03/2020
Field of study

Munich RePEc Personal Archive

Exogeneity, weak identification and instrument selection in econometrics

Author: Doko Tchatoka Sabro Firmin
Publication venue
Publication date: 01/02/2010
Field of study

La dernière décennie a connu un intérêt croissant pour les problèmes posés par les variables instrumentales faibles dans la littérature économétrique, c’est-à-dire les situations où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter. En effet, il est bien connu que lorsque les instruments sont faibles, les distributions des statistiques de Student, de Wald, du ratio de vraisemblance et du multiplicateur de Lagrange ne sont plus standard et dépendent souvent de paramètres de nuisance. Plusieurs études empiriques portant notamment sur les modèles de rendements à l’éducation [Angrist et Krueger (1991, 1995), Angrist et al. (1999), Bound et al. (1995), Dufour et Taamouti (2007)] et d’évaluation des actifs financiers (C-CAPM) [Hansen et Singleton (1982,1983), Stock et Wright (2000)], où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter, ont montré que l’utilisation de ces statistiques conduit souvent à des résultats peu fiables. Un remède à ce problème est l’utilisation de tests robustes à l’identification [Anderson et Rubin (1949), Moreira (2002), Kleibergen (2003), Dufour et Taamouti (2007)]. Cependant, il n’existe aucune littérature économétrique sur la qualité des procédures robustes à l’identification lorsque les instruments disponibles sont endogènes ou à la fois endogènes et faibles. Cela soulève la question de savoir ce qui arrive aux procédures d’inférence robustes à l’identification lorsque certaines variables instrumentales supposées exogènes ne le sont pas effectivement. Plus précisément, qu’arrive-t-il si une variable instrumentale invalide est ajoutée à un ensemble d’instruments valides? Ces procédures se comportent-elles différemment? Et si l’endogénéité des variables instrumentales pose des difficultés majeures à l’inférence statistique, peut-on proposer des procédures de tests qui sélectionnent les instruments lorsqu’ils sont à la fois forts et valides? Est-il possible de proposer les proédures de sélection d’instruments qui demeurent valides même en présence d’identification faible? Cette thèse se focalise sur les modèles structurels (modèles à équations simultanées) et apporte des réponses à ces questions à travers quatre essais. Le premier essai est publié dans Journal of Statistical Planning and Inference 138 (2008) 2649 – 2661. Dans cet essai, nous analysons les effets de l’endogénéité des instruments sur deux statistiques de test robustes à l’identification: la statistique d’Anderson et Rubin (AR, 1949) et la statistique de Kleibergen (K, 2003), avec ou sans instruments faibles. D’abord, lorsque le paramètre qui contrôle l’endogénéité des instruments est fixe (ne dépend pas de la taille de l’échantillon), nous montrons que toutes ces procédures sont en général convergentes contre la présence d’instruments invalides (c’est-à-dire détectent la présence d’instruments invalides) indépendamment de leur qualité (forts ou faibles). Nous décrivons aussi des cas où cette convergence peut ne pas tenir, mais la distribution asymptotique est modifiée d’une manière qui pourrait conduire à des distorsions de niveau même pour de grands échantillons. Ceci inclut, en particulier, les cas où l’estimateur des double moindres carrés demeure convergent, mais les tests sont asymptotiquement invalides. Ensuite, lorsque les instruments sont localement exogènes (c’est-à-dire le paramètre d’endogénéité converge vers zéro lorsque la taille de l’échantillon augmente), nous montrons que ces tests convergent vers des distributions chi-carré non centrées, que les instruments soient forts ou faibles. Nous caractérisons aussi les situations où le paramètre de non centralité est nul et la distribution asymptotique des statistiques demeure la même que dans le cas des instruments valides (malgré la présence des instruments invalides). Le deuxième essai étudie l’impact des instruments faibles sur les tests de spécification du type Durbin-Wu-Hausman (DWH) ainsi que le test de Revankar et Hartley (1973). Nous proposons une analyse en petit et grand échantillon de la distribution de ces tests sous l’hypothèse nulle (niveau) et l’alternative (puissance), incluant les cas où l’identification est déficiente ou faible (instruments faibles). Notre analyse en petit échantillon founit plusieurs perspectives ainsi que des extensions des précédentes procédures. En effet, la caractérisation de la distribution de ces statistiques en petit échantillon permet la construction des tests de Monte Carlo exacts pour l’exogénéité même avec les erreurs non Gaussiens. Nous montrons que ces tests sont typiquement robustes aux intruments faibles (le niveau est contrôlé). De plus, nous fournissons une caractérisation de la puissance des tests, qui exhibe clairement les facteurs qui déterminent la puissance. Nous montrons que les tests n’ont pas de puissance lorsque tous les instruments sont faibles [similaire à Guggenberger(2008)]. Cependant, la puissance existe tant qu’au moins un seul instruments est fort. La conclusion de Guggenberger (2008) concerne le cas où tous les instruments sont faibles (un cas d’intérêt mineur en pratique). Notre théorie asymptotique sous les hypothèses affaiblies confirme la théorie en échantillon fini. Par ailleurs, nous présentons une analyse de Monte Carlo indiquant que: (1) l’estimateur des moindres carrés ordinaires est plus efficace que celui des doubles moindres carrés lorsque les instruments sont faibles et l’endogenéité modérée [conclusion similaire à celle de Kiviet and Niemczyk (2007)]; (2) les estimateurs pré-test basés sur les tests d’exogenété ont une excellente performance par rapport aux doubles moindres carrés. Ceci suggère que la méthode des variables instrumentales ne devrait être appliquée que si l’on a la certitude d’avoir des instruments forts. Donc, les conclusions de Guggenberger (2008) sont mitigées et pourraient être trompeuses. Nous illustrons nos résultats théoriques à travers des expériences de simulation et deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le problème bien connu du rendement à l’éducation. Le troisième essai étend le test d’exogénéité du type Wald proposé par Dufour (1987) aux cas où les erreurs de la régression ont une distribution non-normale. Nous proposons une nouvelle version du précédent test qui est valide même en présence d’erreurs non-Gaussiens. Contrairement aux procédures de test d’exogénéité usuelles (tests de Durbin-Wu-Hausman et de Rvankar- Hartley), le test de Wald permet de résoudre un problème courant dans les travaux empiriques qui consiste à tester l’exogénéité partielle d’un sous ensemble de variables. Nous proposons deux nouveaux estimateurs pré-test basés sur le test de Wald qui performent mieux (en terme d’erreur quadratique moyenne) que l’estimateur IV usuel lorsque les variables instrumentales sont faibles et l’endogénéité modérée. Nous montrons également que ce test peut servir de procédure de sélection de variables instrumentales. Nous illustrons les résultats théoriques par deux applications empiriques: le modèle bien connu d’équation du salaire [Angist et Krueger (1991, 1999)] et les rendements d’échelle [Nerlove (1963)]. Nos résultats suggèrent que l’éducation de la mère expliquerait le décrochage de son fils, que l’output est une variable endogène dans l’estimation du coût de la firme et que le prix du fuel en est un instrument valide pour l’output. Le quatrième essai résout deux problèmes très importants dans la littérature économétrique. D’abord, bien que le test de Wald initial ou étendu permette de construire les régions de confiance et de tester les restrictions linéaires sur les covariances, il suppose que les paramètres du modèle sont identifiés. Lorsque l’identification est faible (instruments faiblement corrélés avec la variable à instrumenter), ce test n’est en général plus valide. Cet essai développe une procédure d’inférence robuste à l’identification (instruments faibles) qui permet de construire des régions de confiance pour la matrices de covariances entre les erreurs de la régression et les variables explicatives (possiblement endogènes). Nous fournissons les expressions analytiques des régions de confiance et caractérisons les conditions nécessaires et suffisantes sous lesquelles ils sont bornés. La procédure proposée demeure valide même pour de petits échantillons et elle est aussi asymptotiquement robuste à l’hétéroscédasticité et l’autocorrélation des erreurs. Ensuite, les résultats sont utilisés pour développer les tests d’exogénéité partielle robustes à l’identification. Les simulations Monte Carlo indiquent que ces tests contrôlent le niveau et ont de la puissance même si les instruments sont faibles. Ceci nous permet de proposer une procédure valide de sélection de variables instrumentales même s’il y a un problème d’identification. La procédure de sélection des instruments est basée sur deux nouveaux estimateurs pré-test qui combinent l’estimateur IV usuel et les estimateurs IV partiels. Nos simulations montrent que: (1) tout comme l’estimateur des moindres carrés ordinaires, les estimateurs IV partiels sont plus efficaces que l’estimateur IV usuel lorsque les instruments sont faibles et l’endogénéité modérée; (2) les estimateurs pré-test ont globalement une excellente performance comparés à l’estimateur IV usuel. Nous illustrons nos résultats théoriques par deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le modèle de rendements à l’éducation. Dans la première application, les études antérieures ont conclu que les instruments n’étaient pas trop faibles [Dufour et Taamouti (2007)] alors qu’ils le sont fortement dans la seconde [Bound (1995), Doko et Dufour (2009)]. Conformément à nos résultats théoriques, nous trouvons les régions de confiance non bornées pour la covariance dans le cas où les instruments sont assez faibles.The last decade shows growing interest for the so-called weak instruments problems in the econometric literature, i.e. situations where instruments are poorly correlated with endogenous explanatory variables. More generally, these can be viewed as situations where model parameters are not identified or nearly so (see Dufour and Hsiao, 2008). It is well known that when instruments are weak, the limiting distributions of standard test statistics - like Student, Wald, likelihood ratio and Lagrange multiplier criteria in structural models - have non-standard distributions and often depend heavily on nuisance parameters. Several empirical studies including the estimation of returns to education [Angrist and Krueger (1991, 1995), Angrist et al. (1999), Bound et al. (1995), Dufour and Taamouti (2007)] and asset pricing model (C-CAPM) [Hansen and Singleton (1982, 1983), Stock and Wright (2000)], have showed that the above procedures are unreliable in presence of weak identification. As a result, identification-robust tests [Anderson and Rubin (1949), Moreira (2003), Kleibergen (2002), Dufour and Taamouti (2007)] are often used to make reliable inference. However, little is known about the quality of these procedures when the instruments are invalid or both weak and invalid. This raises the following question: what happens to inference procedures when some instruments are endogenous or both weak and endogenous? In particular, what happens if an invalid instrument is added to a set of valid instruments? How robust are these inference procedures to instrument endogeneity? Do alternative inference procedures behave differently? If instrument endogeneity makes statistical inference unreliable, can we propose the procedures for selecting "good instruments" (i.e. strong and valid instruments)? Can we propose instrument selection procedure which will be valid even in presence of weak identification? This thesis focuses on structural models and answers these questions through four chapiters. The first chapter is published in Journal of Statistical Planning and Inference 138 (2008) 2649 – 2661. In this chapter, we analyze the effects of instrument endogeneity on two identificationrobust procedures: Anderson and Rubin (1949, AR) and Kleibergen (2002, K) test statistics, with or without weak instruments. First, when the level of instrument endogeneity is fixed (does not depend on the sample size), we show that all these procedures are in general consistent against the presence of invalid instruments (hence asymptotically invalid for the hypothesis of interest), whether the instruments are "strong" or "weak". We also describe situations where this consistency may not hold, but the asymptotic distribution is modified in a way that would lead to size distortions in large samples. These include, in particular, cases where 2SLS estimator remains consistent, but the tests are asymptotically invalid. Second, when the instruments are locally exogenous (the level of instrument endogeneity approaches zero as the sample size increases), we find asymptotic noncentral chi-square distributions with or without weak instruments, and describe situations where the non-centrality parameter is zero and the asymptotic distribution remains the same as in the case of valid instruments (despite the presence of invalid instruments). The second chapter analyzes the effects of weak identification on Durbin-Wu-Hausman (DWH) specification tests an Revankar-Harttley exogeneity test. We propose a finite-and large-sample analysis of the distribution of DWH tests under the null hypothesis (level) and the alternative hypothesis (power), including when identification is deficient or weak (weak instruments). Our finite-sample analysis provides several new insights and extensions of earlier procedures. The characterization of the finite-sample distribution of the test-statistics allows the construction of exact identificationrobust exogeneity tests even with non-Gaussian errors (Monte Carlos tests) and shows that such tests are typically robust to weak instruments (level is controlled). Furthermore, we provide a characterization of the power of the tests, which clearly exhibits factors which determine power. We show that DWH-tests have no power when all instruments are weak [similar to Guggenberger(2008)]. However, power does exist as soon as we have one strong instruments. The conclusions of Guggenberger (2008) focus on the case where all instruments are weak (a case of little practical interest). Our asymptotic distributional theory under weaker assumptions confirms the finite-sample theory. Moreover, we present simulation evidence indicating: (1) over a wide range cases, including weak IV and moderate endogeneity, OLS performs better than 2SLS [finding similar to Kiviet and Niemczyk (2007)]; (2) pretest-estimators based on exogeneity tests have an excellent overall performance compared with usual IV estimator. We illustrate our theoretical results through simulation experiment and two empirical applications: the relation between trade and economic growth and the widely studied problem of returns to education. In the third chapter, we extend the generalized Wald partial exogeneity test [Dufour (1987)] to non-gaussian errors. Testing whether a subset of explanatory variables is exogenous is an important challenge in econometrics. This problem occurs in many applied works. For example, in the well know wage model, one should like to assess if mother’s education is exogenous without imposing additional assumptions on ability and schooling. In the growth model, the exogeneity of the constructed instrument on the basis of geographical characteristics for the trade share is often questioned and needs to be tested without constraining trade share and the other variables. Standard exogeneity tests of the type proposed by Durbin-Wu-Hausman and Revankar-Hartley cannot solve such problems. A potential cure for dealing with partial exogeneity is the use of the generalized linear Wald (GW) method (Dufour, 1987). The GW-procedure however assumes the normality of model errors and it is not clear how robust is this test to non-gaussian errors. We develop in this chapter, a modified version of earlier procedure which is valid even when model errors are not normally distributed. We present simulation evidence indicating that when identification is strong, the standard GW-test is size distorted in presence of non-gaussian errors. Furthermore, our analysis of the performance of different pretest-estimators based on GW-tests allow us to propose two new pretest-estimators of the structural parameter. The Monte Carlo simulations indicate that these pretest-estimators have a better performance over a wide range cases compared with 2SLS. Therefore, this can be viewed as a procedure for selecting variable where a GW-test is used in the first stage to decide which variables should be instruments and which ones are valid instruments. We illustrate our theoretical results through two empirical applications: the well known wage equation and the returns to scale in electricity supply. The results show that the GW-tests cannot reject the exogeneity of mother’s education, i.e. mother’s education may constitute a valid IV for schooling. However, the output in cost equation is endogenous and the price of fuel is a valid IV for estimating the returns to scale. The fourth chapter develops identification-robust inference for the covariances between errors and regressors of an IV regression. The results are then applied to develop partial exogeneity tests and partial IV pretest-estimators which are more efficient than usual IV estimator. When more than one stochastic explanatory variables are involved in the model, it is often necessary to determine which ones are independent of the disturbances. This problem arises in many empirical applications. For example, in the New Keynesian Phillips Curve, one should like to assess whether the interest rate is exogenous without imposing additional assumptions on inflation rate and the other variables. Standard Wu-Durbin-Hausman (DWH) tests which are commonly used in applied work are inappropriate to deal with such a problem. The generalized Wald (GW) procedure (Dufour, 1987) which typically allows the construction of confidence sets as well as testing linear restrictions on covariances assumes that the available instruments are strong. When the instruments are weak, the GW-test is in general size distorted. As a result, its application in models where instruments are possibly weak–returns to education, trade and economic growth, life cycle labor supply, New Keynesian Phillips Curve, pregnancy and the demand for cigarettes–may be misleading. To answer this problem, we develop a finite-and large-sample valid procedure for building confidence sets for covariances allowing for the presence of weak instruments. We provide analytic forms of the confidence sets and characterize necessary and sufficient conditions under which they are bounded. Moreover, we propose two new pretest-estimators of structural parameters based on our above procedure. Both estimators combine 2SLS and partial IV-estimators. The Monte Carlo experiment shows that: (1) partial IV-estimators outperform 2SLS when the instruments are weak; (2) pretestestimators have an excellent overall performance–bias and MSE– compared with 2SLS. Therefore, this can be viewed as a variable selection method where the projection-based techniques is used to decide which variables should be instrumented and which ones are valid instruments. We illustrate our results through two empirical applications: the relation between trade and economic growth and the widely studied problem of returns to education. The results show unbounded confidence sets, suggesting that the IV are relatively poor in these models, as questioned in the literature [Bound (1995)]

Dépôt Institutionnel Numérique

Size-Corrected Wild Bootstrap Tests after Pretesting for Exogeneity with Heteroskedastic or Clustered Data

Author: Doko Tchatoka Firmin
Wang Wenjie
Publication venue
Publication date: 15/12/2023
Field of study

Pretesting for exogeneity has become a routine in many empirical applications involving instrumental variables to decide whether the ordinary least squares or the two-stage least squares (2SLS) method is appropriate. Guggenberger (2010) shows that the second-stage t-test – based on the outcome of a Durbin-Wu-Hausman type pretest for exogeneity in the first stage – has extreme size distortion with asymptotic size equal to 1 when the standard asymptotic critical values are used. In this paper, we first show that both conditional and unconditional on the data, the standard wild bootstrap procedures are invalid for the two-stage testing and a closely related shrinkage method, and therefore are not viable solutions to such size-distortion problem. Then, we propose a novel size-corrected wild bootstrap approach, which combines certain wild bootstrap critical values along with an appropriate size-correction method. We establish uniform validity of this procedure under either conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size. Monte Carlo simulations confirm our theoretical findings. In particular, our proposed method has remarkable power gains over the standard 2SLS-based t-test in many settings, especially when the identification is not strong

Munich RePEc Personal Archive

Identification-robust inference for endogeneity parameters in linear structural models

Author: Doko Tchatoka Firmin
Dufour Jean-Marie
Publication venue
Publication date: 16/08/2012
Field of study

We provide a generalization of the Anderson-Rubin (AR) procedure for inference on parameters which represent the dependence between possibly endogenous explanatory variables and disturbances in a linear structural equation (endogeneity parameters). We focus on second-order dependence and stress the distinction between regression and covariance endogeneity parameters. Such parameters have intrinsic interest (because they measure the effect of "common factors" which induce simultaneity) and play a central role in selecting an estimation method (because they determine "simultaneity biases" associated with least-squares methods). We observe that endogeneity parameters may not be identifiable and we give the relevant identification conditions. We develop identification-robust finite-sample tests for joint hypotheses involving structural and regression endogeneity parameters, as well as marginal hypotheses on regression endogeneity parameters. For Gaussian errors, we provide tests and confidence sets based on standard-type Fisher critical values. For a wide class of parametric non-Gaussian errors (possibly heavy-tailed), we also show that exact Monte Carlo procedures can be applied using the statistics considered. As a special case, this result also holds for usual AR-type tests on structural coefficients. For covariance endogeneity parameters, we supply an asymptotic (identification-robust) distributional theory. Tests for partial exogeneity hypotheses (for individual potentially endogenous explanatory variables) are covered as instances of the class of proposed procedures. The proposed procedures are applied to two empirical examples: the relation between trade and economic growth, and the widely studied problem of returns to education

Identification-robust inference for endogeneity parameters in linear structural models

Author: Doko Tchatoka Firmin
Dufour Jean-Marie
Publication venue
Publication date: 16/08/2012
Field of study

Munich RePEc Personal Archive

Instrument endogeneity and identification-robust tests: some analytical results

Author: Doko Tchatoka Firmin Sabro
Dufour Jean-Marie
Publication venue
Publication date: 31/05/2008
Field of study

When some explanatory variables in a regression are correlated with the disturbance term, instrumental variable methods are typically employed to make reliable inferences. Furthermore, to avoid difficulties associated with weak instruments, identification robust methods are often proposed. However, it is hard to assess whether an instrumental variable is valid in practice because instrument validity is based on the questionable assumption that some of them are exogenous. In this paper, we focus on structural models and analyze the effects of instrument endogeneity on two identification-robust procedures, the Anderson-Rubin (1949, AR) and the Kleibergen (2002, K) tests, with or without weak instruments. Two main setups are considered: (1) the level of “instrument” endogeneity is fixed (does not depend on the sample size), and (2) the instruments are locally exogenous, i.e. the parameter which controls instrument endogeneity approaches zero as the sample size increases. In the first setup, we show that both test procedures are in general consistent against the presence of invalid instruments (hence asymptotically invalid for the hypothesis of interest), whether the instruments are “strong” or “weak”. We also describe cases where test consistency may not hold, but the asymptotic distribution is modified in a way that would lead to size distortions in large samples. These include, in particular, cases where the 2SLS estimator remains consistent, but the AR and K tests are asymptotically invalid. In the second setup, we find (non-degenerate) asymptotic non-central chi-square distributions in all cases, and describe cases where the non-centrality parameter is zero and the asymptotic distribution remains the same as in the case of valid instruments (despite the presence of invalid instruments). Overall, our results underscore the importance of checking for the presence of possibly invalid instruments when applying “identification-robust” tests

Munich RePEc Personal Archive