2,116 research outputs found
Latent Thresholds Analysis of Choice Data with Multiple Bids and Response Options
In many stated preference settings stakeholders will be uncertain as to their exact willingness-to-pay for a proposed environmental amenity. To accommodate this possibility analysts have designed elicitation formats with multiple bids and response options that allow for the expression of uncertainty. We argue that the information content flowing from such elicitation has not yet been fully and efficiently exploited in existing contributions. We introduce a Latent Thresholds Estimator that focuses on the simultaneous identification of the full set of thresholds that delineate an individual's value space in accordance with observed response categories. Our framework provides a more complete picture of the underlying value distribution, the marginal effects of regressors, and the impact of bid designs on estimation efficiency. We show that the common practice of re-coding responses to derive point estimate of willingness-to-pay leaves useful information untapped and can produce misleading results if thresholds are highly correlated.Stated Preference; Multiple Bounded Elicitation; Polychotomous Choice; Bayesian Estimation; Value Uncertainty
Approximate Bayesian approaches and semiparametric methods for handling missing data
This thesis consists of four research papers focusing on estimation and inference in missing data. In the first paper (Chapter 2), an approximate Bayesian approach is developed to handle unit nonresponse with parametric model assumptions on the response probability, but without model assumptions for the outcome variable. The proposed Bayesian method is also extended to incorporate the auxiliary information from full sample. In second paper (Chapter 3), a new Bayesian method using the Spike-and-Slab prior is proposed to handle the sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. In third paper (Chapter 4), we develop a robust semiparametric method based on the profile likelihood obtained from semiparametric response model. The proposed method uses the observed regression model and the semiparametric response model to achieve robustness. An efficient algorithm using fractional imputation is developed. The bootstrap testing procedure is also proposed to test ignorability assumption. In last paper (Chapter 5), we propose a novel semiparametric fractional imputation method using Gaussian mixture model for handling multivariate missingness. The proposed method is computationally efficient and leads to robust estimation. The proposed method is further extended to incorporate the categorical auxiliary information. Asymptotic properties are developed for each proposed methods. Both simulation studies and real data applications are conducted to check the performance of the proposed methods in this thesis
Nonlinear Factor Models for Network and Panel Data
Factor structures or interactive effects are convenient devices to
incorporate latent variables in panel data models. We consider fixed effect
estimation of nonlinear panel single-index models with factor structures in the
unobservables, which include logit, probit, ordered probit and Poisson
specifications. We establish that fixed effect estimators of model parameters
and average partial effects have normal distributions when the two dimensions
of the panel grow large, but might suffer of incidental parameter bias. We show
how models with factor structures can also be applied to capture important
features of network data such as reciprocity, degree heterogeneity, homophily
in latent variables and clustering. We illustrate this applicability with an
empirical example to the estimation of a gravity equation of international
trade between countries using a Poisson model with multiple factors.Comment: 49 pages, 6 tables, the changes in v4 include numerical results with
more simulations and minor edits in the main text and appendi
Statistical guarantees for the EM algorithm: From population to sample-based analysis
We develop a general framework for proving rigorous guarantees on the
performance of the EM algorithm and a variant known as gradient EM. Our
analysis is divided into two parts: a treatment of these algorithms at the
population level (in the limit of infinite data), followed by results that
apply to updates based on a finite set of samples. First, we characterize the
domain of attraction of any global maximizer of the population likelihood. This
characterization is based on a novel view of the EM updates as a perturbed form
of likelihood ascent, or in parallel, of the gradient EM updates as a perturbed
form of standard gradient ascent. Leveraging this characterization, we then
provide non-asymptotic guarantees on the EM and gradient EM algorithms when
applied to a finite set of samples. We develop consequences of our general
theory for three canonical examples of incomplete-data problems: mixture of
Gaussians, mixture of regressions, and linear regression with covariates
missing completely at random. In each case, our theory guarantees that with a
suitable initialization, a relatively small number of EM (or gradient EM) steps
will yield (with high probability) an estimate that is within statistical error
of the MLE. We provide simulations to confirm this theoretically predicted
behavior
Topics in generalized linear mixed models and spatial subgroup analysis
In this thesis, two topics are studied, generalized linear mixed models and spatial subgroup analysis.
Within the topic of generalized linear mixed models, this thesis focuses on three aspects. First, estimation of link function in generalized linear models is studied. We propose a new algorithm that uses P-spline for nonparametrically estimating the link function which is guaranteed to be mono- tone. We also conduct extensive simulation studies to compare our nonparametric approach with various parametric approaches. Second, a spatial hierarchical model based on generalized Dirichlet distribution is developed to construct small area estimators of compositional proportions in the National Resources Inventory survey. At the observation level, the standard design based estima- tors of the proportions are assumed to follow the generalized Dirichlet distribution. After proper transformation of the design based estimators, beta regression is applicable. We consider a logit mixed model for the expectation of the beta distribution, which incorporates covariates through fixed effects and spatial effect through a conditionally autoregressive process. Finally, convergence rates of Markov chain Monte Carlo algorithms for Bayesian generalized linear mixed models are studied. For Bayesian probit linear mixed models, we construct two-block Gibbs samplers using the data augmentation (DA) techniques and prove the geometric ergodicity of the Gibbs samplers under both proper priors and improper priors. We also provide conditions for posterior propriety when the design matrices take commonly observed forms. For Bayesian logistic regression models, we establish that the Markov chain underlying Polson et al.’s (2013) DA algorithm is geometri- cally ergodic under a flat prior. For Bayesian logistic linear mixed models, we construct a two-block Gibbs sampler using Polson et al.’s (2013) DA technique under proper priors and prove the uniform ergodicity of this Gibbs sampler.
The other topic is spatial subgroup analysis with repeated measures. We use pairwise concave penalties for the differences among group regression coefficients based on smoothly clipped absolute deviation penalty. We also consider pairwise weights associated with each paired penalty based on spatial information. We show that the oracle estimator based on weighted least square is a local minimizer of the objective function with probability approaching 1 under some conditions. In the simulation study, we compare the performances of different weights as well as equal weights, which shows that the spatial information will help when the minimal group difference is small or the number of repeated measures is small
Algorithmic Decision-Making Safeguarded by Human Knowledge
Commercial AI solutions provide analysts and managers with data-driven
business intelligence for a wide range of decisions, such as demand forecasting
and pricing. However, human analysts may have their own insights and
experiences about the decision-making that is at odds with the algorithmic
recommendation. In view of such a conflict, we provide a general analytical
framework to study the augmentation of algorithmic decisions with human
knowledge: the analyst uses the knowledge to set a guardrail by which the
algorithmic decision is clipped if the algorithmic output is out of bound, and
seems unreasonable. We study the conditions under which the augmentation is
beneficial relative to the raw algorithmic decision. We show that when the
algorithmic decision is asymptotically optimal with large data, the
non-data-driven human guardrail usually provides no benefit. However, we point
out three common pitfalls of the algorithmic decision: (1) lack of domain
knowledge, such as the market competition, (2) model misspecification, and (3)
data contamination. In these cases, even with sufficient data, the augmentation
from human knowledge can still improve the performance of the algorithmic
decision
- …