    Well-posedness of measurement error models for self-reported data

    It is widely admitted that the inverse problem of estimating the distribution of a latent variable X* from an observed sample of X, a contaminated measurement of X*, is ill-posed. This paper shows that measurement error models for self-reporting data are well-posed, assuming the probability of reporting truthfully is nonzero, which is an observed property in validation studies. This optimistic result suggests that one should not ignore the point mass at zero in the error distribution when modeling measurement errors in self-reported data. We also illustrate that the classical measurement error models may in fact be conditionally well-posed given prior information on the distribution of the latent variable X*. By both a Monte Carlo study and an empirical application, we show that failing to account for the property can lead to significant bias on estimation of distribution of X*.

    Estimation of Nonlinear Models with Mismeasured Regressors Using Marginal Information

    We consider the estimation of nonlinear models with mismeasured explanatory variables, when information on the marginal distribution of the true values of these variables is available. We derive a semi-parametric MLE that is shown to be n\sqrt{n} consistent and asymptotically normally distributed. In a simulation experiment we find that the finite sample distribution of the estimator is close to the asymptotic approximation. The semi-parametric MLE is applied to a duration model for AFDC welfare spells with misreported welfare benefits. The marginal distribution of the correctly measured welfare benefits is obtained from an administrative source.

    Identification and Inference of Nonlinear Models Using Two Samples with Arbitrary Measurement Errors

    This paper considers identification and inference of a general latent nonlinear model using two samples, where a covariate contains arbitrary measurement errors in both samples, and neither sample contains an accurate measurement of the corresponding true variable. The primary sample consists of some dependent variables, some error-free covariates and an error-ridden covariate, where the measurement error has unknown distribution and could be arbitrarily correlated with the latent true values. The auxiliary sample consists of another noisy measurement of the mismeasured covariate and some error-free covariates. We first show that a general latent nonlinear model is nonparametrically identified using the two samples when both could have nonclassical errors, with no requirement of instrumental variables nor independence between the two samples. When the two samples are independent and the latent nonlinear model is parameterized, we propose sieve quasi maximum likelihood estimation (MLE) for the parameter of interest, and establish its root-n consistency and asymptotic normality under possible misspecification, and its semiparametric efficiency under correct specification. We also provide a sieve likelihood ratio model selection test to compare two possibly misspecified parametric latent models. A small Monte Carlo simulation and an empirical example are presented.Data combination, Nonlinear errors-in-variables model, Nonclassical measurement error, Nonparametric identification, Misspecified parametric latent model, Sieve likelihood estimation and inference

    Identifying the returns to lying when the truth is unobserved

    Consider an observed binary regressor D and an unobserved binary variable D*, both of which affect some other variable Y . This paper considers nonparametric identification and estimation of the effect of D on Y , conditioning on D* = 0. For example, suppose Y is a person's wage, the unobserved D* indicates if the person has been to college, and the observed D indicates whether the individual claims to have been to college. This paper then identifies and estimates the difference in average wages between those who falsely claim college experience versus those who tell the truth about not having college.We estimate this average returns to lying to be about 7% to 20%. Nonparametric identification without observing D* is obtained either by observing a variable V that is roughly analogous to an instrument for ordinary measurement error, or by imposing restrictions on model error moments.

    Nonparametric identification of dynamic models with unobserved state variables

    We consider the identification of a Markov process {W t, X t*} for t=1,2,...,T when only {W t} for t=1, 2,..,T is observed. In structural dynamic models, W t denotes the sequence of choice variables and observed state variables of an optimizing agent, while X t* denotes the sequence of serially correlated state variables. The Markov setting allows the distribution of the unobserved state variable X t* to depend on W t-1 and X t-1 *. We show that the joint distribution of (W t, X t*, W t-1 , X t-1 *) is identified from the observed distribution of (W t+1 , W t, W t-1 , W t-2 , W t-3 ) under reasonable assumptions. Identification of the joint distribution of (W t, X t*, W t-1 , X t-1 *) is a crucial input in methodologies for estimating dynamic models based on the "conditional-choice-probability (CCP)" approach pioneered by Hotz and Miller.

    Misclassification Errors and the Underestimation of U.S. Unemployment Rates

    Using recent results in the measurement error literature, we show that the official U.S. unemployment rates substantially underestimate the true levels of unemployment, due to misclassification errors in labor force status in Current Population Surveys. Our closed-form identification of the misclassification probabilities relies on the key assumptions that the misreporting behaviors only depend on the true values and that the true labor force status dynamics satisfy a Markov-type property. During the period of 1996 to 2009, the corrected monthly unemployment rates are 1 to 4.6 percentage points (25% to 45%) higher than the official rates, and are more sensitive to changes in business cycles. Labor force participation rates, however, are not affected by this correction. We also provide results for various subgroups of the U.S. population defined by gender, race and age.unemployment rate, labor force participation rate, misclassification, measurement error, Current Population Survey

    Estimating Production Functions with Robustness Against Errors in the Proxy Variables

    This paper proposes a new semi-nonparametric maximum likelihood estimation method for estimating production functions. The method extends the literature on structural estimation of production functions, started by the seminal work of Olley and Pakes (1996), by relaxing the scalar-unobservable assumption about the proxy variables. The key additional assumption needed in the identification argument is the existence of two conditionally independent proxy variables. The assumption seems reasonable in many important cases. The new method is straightforward to apply, and a consistent estimate of the asymptotic covariance matrix of the structural parameters can be easily computed.

    Identification and estimation of nonclassical nonlinear errors-in-variables models with continuous distributions using instruments

    While the literature on nonclassical measurement error traditionally relies on the availability of an auxiliary dataset containing correctly measured observations, this paper establishes that the availability of instruments enables the identification of a large class of nonclassical nonlinear errors-in-variables models with continuously distributed variables. The main identifying assumption is that, conditional on the value of the true regressors, some "measure of location" of the distribution of the measurement error (e.g. its mean, mode or median) is equal to zero. The proposed approach relies on the eigenvalue-eigenfunction decomposition of an integral operator associated with specific joint probability densities. The main identifying assumption is used to order the eigenfunctions so that the decomposition is unique. The authors propose a convenient sieve-based estimator, derive its asymptotic properties and investigate its finite-sample behavior through Monte Carlo simulations. An example of application to the relationship between earnings and divorce rates is also provided.

