1,192 research outputs found

    Categorical Data

    Get PDF
    A very brief survey of regression for categorical data. Categorical outcome (or discrete outcome or qualitative response) regression models are models for a discrete dependent variable recording in which of two or more categories an outcome of interest lies. For binary data (two categories) probit and logit models or semiparametric methods are used. For multinomial data (more than two categories) that are unordered, common models are multinomial and conditional logit, nested logit, multinomial probit, and random parameters logit. The last two models are estimated using simulation or Bayesian methods. For ordered data, standard multinomial models are ordered logit and probit, or count models are used if ordered discrete data are actually a count.binary data, multinomial, logit, probit, count data

    A Survival Analysis of Australian Equity Mutual Funds

    Get PDF
    Determining which types of mutual (or managed) investment funds are good financial investments is complicated by potential surbivorship biases. This project adds to a small recent international literature on the patterns and determinants of mutual fund survivorship. We use statistical techniques for survival data that are rarely applied in finance. Of specific interest is the hazard rate of fund closure, which gives the variation over time in the conditional probability of fund closure given fund survival to date. For a sample of 251 retail investment funds in Australia from 1980 to 1999 we identify a hump-shaped hazard function that reaches its maximum after about five or six years, a pattern similar to the UK findings of Lunde, Timmermann and Blake (1999). We also consider the impact of monthly and annual fund performance (gross and relative to a market benchmark). Returns relative to the benckmark are much more important than gross returns, with hgiher relative returns associated with lower hazard of fund closure. There appears to be an asymmetric response to performance, with positive shocks having a larger impact on the hazard rate than negative shocks.mutual funds; survivorship bias; duration analysis; cox regression

    Robust Inference with Clustered Data

    Get PDF
    In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random eects, xed eects, dierences in dierences, cluster bootstrap, few clusters, multi-way clusters.

    Robust Inference with Clustered Data

    Get PDF
    In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random effects, fixed effects, differences in differences, cluster bootstrap, few clusters, multi-way clusters.

    Bootstrap-Based Improvements for Inference with Clustered Errors

    Get PDF
    Microeconometrics researchers have increasingly realized the essential need to account for any within-group dependence in estimating standard errors of regression parameter estimates. The typical preferred solution is to calculate cluster-robust or sandwich standard errors that permit quite general heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. In applications with few (5-30) clusters, standard asymptotic tests can over-reject considerably. We investigate more accurate inference using cluster bootstrap-t procedures that provide asymptotic refinement. These procedures are evaluated using Monte Carlos, including the much-cited differences-in-differences example of Bertrand, Mullainathan and Duflo (2004). In situations where standard methods lead to rejection rates in excess of ten percent (or more) for tests of nominal size 0.05, our methods can reduce this to five percent. In principle a pairs cluster bootstrap should work well, but in practice a Wild cluster bootstrap performs better.clustered errors; random effects; cluster robust; sandwich; bootstrap; bootstrap-t; clustered bootstrap; pairs bootstrap; wild bootstrap.

    Estimating user-defined nonlinear regression models in Stata and in Mata

    Get PDF
    This talk will overview how to estimate nonlinear regression models that are not covered by Stata's many built-in estimation commands. Mata command optimize will be emphasized, and Stata command ml will also be covered. The material is drawn from chapter 11 of A.C. Cameron and P.K. Trivedi (2008), Microeconometrics using Stata, Stata Press.

    Bootstrap-Based Improvements for Inference with Clustered Errors

    Get PDF
    Researchers have increasingly realized the need to account for within-group dependence in estimating standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust standard errors that permit heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. Standard asymptotic tests can over-reject, however, with few (5-30) clusters. We investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement. These procedures are evaluated using Monte Carlos, including the example of Bertrand, Duflo and Mullainathan (2004). Rejection rates of ten percent using standard methods can be reduced to the nominal size of five percent using our methods.

    Robust Inference with Multi-way Clustering

    Get PDF
    In this paper we propose a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g. Liang and Zeger (1986), Arellano (1987)) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state-year effects example of Bertrand et al. (2004) to two dimensions; and by application to two studies in the empirical public/labor literature where two-way clustering is present.

    Bivariate Count Data Regression Using Series Expansions: With Applications

    Get PDF
    Most research on count data regression models, i.e. models for there the dependent variable takes only non-negative integer values or count values, has focused on the univariate case. Very little attention has been given to joint modeling of two or more counts. We propose parametric regression models for bivariate counts based on squared polynomial expansions around a baseline density. The models are more flexible than the current leading bivariate count model, the bivariate Poisson. The models are applied to data on the use of prescribed and nonprescribed medications.

    Modeling the Differences in Counted Outcomes using Bivariate Copula Models: with Application to Mismeasured Counts

    Get PDF
    This paper makes three contributions. First, it uses copula functions to obtain a flexible bivariate parametric model for nonnegative integer-valued data (counts). Second, it recovers the distribution of the difference in the two counts from a specifed bivariate count distribution. Third, the methods are applied to counts that are measured with error. Specifically we model the determinants of the difference between the self-reported number of doctor visits (measured with error) and true number of doctor visits (also available in the data used).
    • ā€¦
    corecore