16 research outputs found

    A Flexible Zero-Inflated Poisson Regression Model

    Get PDF
    A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an EM-like algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer\u27s disease

    A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data

    Full text link
    In many semiparametric models that are parameterized by two types of parameters---a Euclidean parameter of interest and an infinite-dimensional nuisance parameter---the two parameters are bundled together, that is, the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by developing an efficient estimating method for the regression parameters, we propose a general sieve M-theorem for bundled parameters and apply the theorem to deriving the asymptotic theory for the sieve maximum likelihood estimation in the linear regression model for censored survival data. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the Newton--Raphson algorithm. We show that the proposed estimator is consistent and asymptotically normal and achieves the semiparametric efficiency bound. Simulation studies demonstrate that the proposed method performs well in practical settings and yields more efficient estimates than existing estimating equation based methods. Illustration with a real data example is also provided.Comment: Published in at http://dx.doi.org/10.1214/11-AOS934 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

    Get PDF
    BACKGROUND: The Targeted Maximum Likelihood Estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. METHODS: We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarised the epidemiological discipline, geographical location, expertise of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. RESULTS: Of the 89 publications included, 33% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By 2022, 59% of the publications originated from outside the United States and explored up to 7 different epidemiological disciplines in 2021-22. Double-robustness, bias reduction and model misspecification were the main motivations that drew researchers towards the TMLE framework. Through time, a wide variety of methodological, tutorial and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. CONCLUSIONS: There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits, and adoption, of TMLE

    Bayesian Semiparametric Quantile Regression for Clustered Data

    Get PDF
    Traditional frequentist quantile regression makes few assumptions on the form of the error distribution and thus is able to accommodate non-normal errors. However, inference on the quantile regression models could be challenging for the unknown error distribution, though asymptotic or resampling methods were developed. Bayesian literature on quantile regression with random effects is relatively limited. The quantile regression approach proposed in this dissertation is founded on Bayesian probabilistic modeling for the underlying unknown distributions. By adopting the error density with a nonparametric scale mixture models, we developed Bayesian semiparametric models to make an inference on any quantile of interest and to allow for flexible shapes of the error densities. In this dissertation, we aimed to develop Bayesian semiparametric quantile regressions for both longitudinal data and clustered interval-censored data. We first proposed a semiparametric quantile mixed effect regression for clustered data, which relaxed normality assumption for both random effects and the error term. We then developed a semiparametric accelerated failure time quantile regression for the clustered interval-censored data. Both of the methods allow for estimates for the subgroup specific parameters and the detection of heterogeneity in the random effects population under nonparametric settings. Markov chain Monte Carlo (MCMC) methods provide computationally feasible implementations of Bayesian inference and learning. However, the speed of convergence can be challenging for highly complex and nonconjugate models. Specifically, Gibbs sampling algorithm that employs the addition of auxiliary parameters was used to speed up posterior sampling in our study. Several variations of the proposed model were considered and compared via the deviance information criterion. The performance of the proposed methods was evaluated by extensive simulation studies, and examples using data from Orthodontic clinics and lymphatic filariasis drug studies were presented as illustration

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Extremal quantile treatment effects

    Get PDF
    corecore