2,728 research outputs found
A double-hurdle count model for completed fertility data from the developing world
This paper reports a study on the socio-economic determinants of completed fertility in Mexico. An innovative Poisson Double-Hurdle count model is developed for the analysis. This methodological approach allows low and high order parities to be determined by two different data generating mechanisms, and explicitly accounts for potential endogenous switching between regimes. Unobserved heterogeneity is properly controlled. Special attention is given to study how socio-economic characteristics such as religion and ethnic group affect the likelihood of transition from low to high order parities. Findings indicate that education and Catholicism are associated with reductions in the likelihood of transition from parities lower than four to high order parities. Being an indigenous language speaker, in contrast, increases the odds of a large family.Completed fertility, count data models, double-hurdle model
Estimation of ordinal response models, accounting for sample selection bias
Studying behaviour in economics, sociology, and statistics often involves fitting a model in which the outcome is an ordinal response which is only observed for a subsample of subjects. (For example, questions about health satisfaction in a survey might be asked only of respondents who have a particular health condition.) In this situation, estimation of the ordinal response model without taking account of this "sample selection" effect, using e.g. -ologit- or -oprobit-, may lead to biased parameter estimates. (In the earlier example, unobserved factors that increase the chances of having the health condition may be correlated with the unobserved factors that affect health satisfaction.) The program -gllamm- can be used to estimate ordinal response models accounting for sample selection, by ML. This paper describes a "wrapper" program, -osm-, that calls -gllamm- to fit the model. It accepts data in a simple structure, has a straightforward syntax and, moreover, reports output in a manner that is easily interpretable. One important feature of -osm- is that the log-likelihood can be evaluated using adaptive quadrature.
Dealing with the cryptic survey: Processing labels and value labels with Mata
Survey data comes often as a plain table containing cryptic variable names, numbers, and letters. To make sense of the data, the researcher is given a questionnaire or a code book that contains a list of variable names, their description, and an interpretation of the values (either a number or a string) that each variable can take. Code books are commonly provided as plain text or in PDF format. Hence, the researcher is left āfreeā to type labels and value labels one by one. This often leads to bad research habits, such as ācuttingā and āprocessingā the piece of survey the researcher needs in the short-run and leaving the rest for future processing. Obviously, this is boring, time consuming, and eventually leads to the creation of various versions of the same survey, an inability to track important changes, and an incapacity to reproduce research resultsābecause the researcher cannot recreate the analyzed dataset step by step from the original source. In this talk, I will discuss how to recover the information that is contained in questionnaires or code books and how to process this information in a clean, fast, and efficient way with Mata.
Bivariate dynamic probit models for panel data
In this talk, I will discuss the main methodological features of the bivariate dynamic probit model for panel data. I will present an example using simulated data, giving special emphasis to the initial conditions problem in dynamic models and the difference between true and spurious state dependence. The model is fit by maximum simulated likelihood.
Non-pecuniary returns to higher education: the effect on smoking intensity in the UK
This paper investigates whether higher education (HE) produces non-pecuniary returns via a reduction in the
intensity of consumption of health-damaging substances. In particular, it focuses on current smoking intensity of the British individuals sampled in the 29-year follow-up survey of the 1970 British Cohort Study. We estimate endogenous dummy ordinal response models for cigarette consumption and show that HE is endogenous with respect to smoking intensity and that even when endogeneity is accounted for, HE is found to have a strong negative effect on smoking intensity. Moreover, pecuniary channels, such as occupation and income, mediate only a minor part of the effect of HE. Our results are robust to modelling individual self-selection into current smoking
participation (at age 29) and to estimating a dynamic model in which past smoking levels affect current smoking levels
Selection-endogenous ordered probit and dynamic ordered probit models
In this presentation we define two qualitatitive response models: 1) Selection Endogenous Dummy Ordered Probit model (SED-OP); 2) a Selection Endogenous Dummy Dynamic Selection Ordered Probit model (SED- DOP). The SED-OP model is a three-equation model constituted of an endogenous dummy equation, a selection equation, and a main equation which has an ordinal response form. The main feature of the model is that the endogenous dummy enters both the selection equation and the main equation. The dynamic SED-DOP model allows both the selection equation and the ordered equation to be dynamic by including lagged individual behaviour. Initial conditions are properly accounted for and free correlation among unobservables entering each of the three equations is allowed. We show how these models can be estimated in Stata using Maximum Simulated Likelihood.
FIML estimation of an endogenous switching model for count data
We develop FIML code for estimating a Poisson Count data model with lognormal unobserved heterogeneity and an endogenous dummy variable as proposed by Terza (1998). Gauss-Hermite quadrature is used for calculating the log-likelihood and a -ml d0- method is employed. We present an example and discuss the problems found during the development of the code.
Endogenous Treatment Effects for Count Data Models with Sample Selection or Endogenous Participation
In this paper we propose an estimator for models in which an endogenous dichotomous treatment affects a count outcome in the presence of either sample selection or endogenous participation using maximum simulated likelihood. We allow for the treatment to have an effect on both the sample selection or the participation rule and the main outcome. Applications of this model are frequent in ā but are not limited to ā health economics. We show an application of the model using data from Kenkel and Terza (2001), who investigate the effect of physician advice on the amount of alcohol consumption. Our estimates suggest that in these data (i) neglecting treatment endogeneity leads to a wrongly signed effect of physician advice on drinking intensity, (ii) neglecting endogenous participation leads to an upward biased estimate of the treatment effect of physician advice on drinking intensity.count data, drinking, endogenous participation, maximum simulated likelihood, sample selection, treatment effects
Missing ordinal covariates with informative selection
This paper considers the problem of parameter estimation in a model for a continuous response variable y when an important ordinal explanatory variable x is missing for a large proportion of the sample. Non-missingness of x, or sample selection, is correlated with the response variable and/or with the unobserved values the ordinal explanatory variable takes when missing. We suggest solving the endogenous selection, or 'not missing at random' (NMAR), problem by modelling the informative selection mechanism, the ordinal explanatory variable, and the response variable together. The use of the method is illustrated by re-examining the problem of the ethnic gap in school achievement at age 16 in England using linked data from the National Pupil database (NPD), the Longitudinal Study of Young People in England (LSYPE), and the Census 2001.Missing covariate, sample selection, latent class models, ordinal variables, NMAR
- ā¦