2,116 research outputs found

    Latent Thresholds Analysis of Choice Data with Multiple Bids and Response Options

    Get PDF
    In many stated preference settings stakeholders will be uncertain as to their exact willingness-to-pay for a proposed environmental amenity. To accommodate this possibility analysts have designed elicitation formats with multiple bids and response options that allow for the expression of uncertainty. We argue that the information content flowing from such elicitation has not yet been fully and efficiently exploited in existing contributions. We introduce a Latent Thresholds Estimator that focuses on the simultaneous identification of the full set of thresholds that delineate an individual's value space in accordance with observed response categories. Our framework provides a more complete picture of the underlying value distribution, the marginal effects of regressors, and the impact of bid designs on estimation efficiency. We show that the common practice of re-coding responses to derive point estimate of willingness-to-pay leaves useful information untapped and can produce misleading results if thresholds are highly correlated.Stated Preference; Multiple Bounded Elicitation; Polychotomous Choice; Bayesian Estimation; Value Uncertainty

    Approximate Bayesian approaches and semiparametric methods for handling missing data

    Get PDF
    This thesis consists of four research papers focusing on estimation and inference in missing data. In the first paper (Chapter 2), an approximate Bayesian approach is developed to handle unit nonresponse with parametric model assumptions on the response probability, but without model assumptions for the outcome variable. The proposed Bayesian method is also extended to incorporate the auxiliary information from full sample. In second paper (Chapter 3), a new Bayesian method using the Spike-and-Slab prior is proposed to handle the sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. In third paper (Chapter 4), we develop a robust semiparametric method based on the profile likelihood obtained from semiparametric response model. The proposed method uses the observed regression model and the semiparametric response model to achieve robustness. An efficient algorithm using fractional imputation is developed. The bootstrap testing procedure is also proposed to test ignorability assumption. In last paper (Chapter 5), we propose a novel semiparametric fractional imputation method using Gaussian mixture model for handling multivariate missingness. The proposed method is computationally efficient and leads to robust estimation. The proposed method is further extended to incorporate the categorical auxiliary information. Asymptotic properties are developed for each proposed methods. Both simulation studies and real data applications are conducted to check the performance of the proposed methods in this thesis

    Nonlinear Factor Models for Network and Panel Data

    Get PDF
    Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects have normal distributions when the two dimensions of the panel grow large, but might suffer of incidental parameter bias. We show how models with factor structures can also be applied to capture important features of network data such as reciprocity, degree heterogeneity, homophily in latent variables and clustering. We illustrate this applicability with an empirical example to the estimation of a gravity equation of international trade between countries using a Poisson model with multiple factors.Comment: 49 pages, 6 tables, the changes in v4 include numerical results with more simulations and minor edits in the main text and appendi

    Statistical guarantees for the EM algorithm: From population to sample-based analysis

    Full text link
    We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in the limit of infinite data), followed by results that apply to updates based on a finite set of samples. First, we characterize the domain of attraction of any global maximizer of the population likelihood. This characterization is based on a novel view of the EM updates as a perturbed form of likelihood ascent, or in parallel, of the gradient EM updates as a perturbed form of standard gradient ascent. Leveraging this characterization, we then provide non-asymptotic guarantees on the EM and gradient EM algorithms when applied to a finite set of samples. We develop consequences of our general theory for three canonical examples of incomplete-data problems: mixture of Gaussians, mixture of regressions, and linear regression with covariates missing completely at random. In each case, our theory guarantees that with a suitable initialization, a relatively small number of EM (or gradient EM) steps will yield (with high probability) an estimate that is within statistical error of the MLE. We provide simulations to confirm this theoretically predicted behavior

    Topics in generalized linear mixed models and spatial subgroup analysis

    Get PDF
    In this thesis, two topics are studied, generalized linear mixed models and spatial subgroup analysis. Within the topic of generalized linear mixed models, this thesis focuses on three aspects. First, estimation of link function in generalized linear models is studied. We propose a new algorithm that uses P-spline for nonparametrically estimating the link function which is guaranteed to be mono- tone. We also conduct extensive simulation studies to compare our nonparametric approach with various parametric approaches. Second, a spatial hierarchical model based on generalized Dirichlet distribution is developed to construct small area estimators of compositional proportions in the National Resources Inventory survey. At the observation level, the standard design based estima- tors of the proportions are assumed to follow the generalized Dirichlet distribution. After proper transformation of the design based estimators, beta regression is applicable. We consider a logit mixed model for the expectation of the beta distribution, which incorporates covariates through fixed effects and spatial effect through a conditionally autoregressive process. Finally, convergence rates of Markov chain Monte Carlo algorithms for Bayesian generalized linear mixed models are studied. For Bayesian probit linear mixed models, we construct two-block Gibbs samplers using the data augmentation (DA) techniques and prove the geometric ergodicity of the Gibbs samplers under both proper priors and improper priors. We also provide conditions for posterior propriety when the design matrices take commonly observed forms. For Bayesian logistic regression models, we establish that the Markov chain underlying Polson et al.’s (2013) DA algorithm is geometri- cally ergodic under a flat prior. For Bayesian logistic linear mixed models, we construct a two-block Gibbs sampler using Polson et al.’s (2013) DA technique under proper priors and prove the uniform ergodicity of this Gibbs sampler. The other topic is spatial subgroup analysis with repeated measures. We use pairwise concave penalties for the differences among group regression coefficients based on smoothly clipped absolute deviation penalty. We also consider pairwise weights associated with each paired penalty based on spatial information. We show that the oracle estimator based on weighted least square is a local minimizer of the objective function with probability approaching 1 under some conditions. In the simulation study, we compare the performances of different weights as well as equal weights, which shows that the spatial information will help when the minimal group difference is small or the number of repeated measures is small

    Algorithmic Decision-Making Safeguarded by Human Knowledge

    Full text link
    Commercial AI solutions provide analysts and managers with data-driven business intelligence for a wide range of decisions, such as demand forecasting and pricing. However, human analysts may have their own insights and experiences about the decision-making that is at odds with the algorithmic recommendation. In view of such a conflict, we provide a general analytical framework to study the augmentation of algorithmic decisions with human knowledge: the analyst uses the knowledge to set a guardrail by which the algorithmic decision is clipped if the algorithmic output is out of bound, and seems unreasonable. We study the conditions under which the augmentation is beneficial relative to the raw algorithmic decision. We show that when the algorithmic decision is asymptotically optimal with large data, the non-data-driven human guardrail usually provides no benefit. However, we point out three common pitfalls of the algorithmic decision: (1) lack of domain knowledge, such as the market competition, (2) model misspecification, and (3) data contamination. In these cases, even with sufficient data, the augmentation from human knowledge can still improve the performance of the algorithmic decision
    • …
    corecore