1,039 research outputs found

    Non-parametric maximum likelihood estimation of interval-censored failure time data subject to misclassification

    Get PDF
    The paper considers non-parametric maximum likelihood estimation of the failure time distribution for interval censored data subject to misclassification. Such data can arise from two types of observation scheme; either where observations continue until the first positive test result or where tests continue regardless of the test results. In the former case, the misclassification probabilities must be known, whereas in the latter case joint estimation of the event-time distribution and misclassification probabilities is possible. The regions for which the maximum likelihood estimate can only have support are derived. Algorithms for computing the maximum likelihood estimate are investigated and it is shown that algorithms appropriate for computing non-parametric mixing distributions perform better than an iterative convex minorant algorithm in terms of time to absolute convergence. A profile likelihood approach is proposed for joint estimation. The methods are illustrated on a data set relating to the onset of cardiac allograft vasculopathy in post-heart-transplantation patients

    Statistical methods in modeling disease surveillance data with misclassification

    Get PDF
    This thesis focuses on constructing appropriate statistical models to monitor the dynamics of disease transmission in animal disease surveillance system. One big challenge in analyzing such disease surveillance data is that the diagnostic tests are usually known to have imperfect sensitivity and specificity, thus the observations are usually misclassified, which introduces uncertainty in determination and modeling of the true disease status among animals. The thesis consists of three projects focusing on three different models and statistical inferences for different disease surveillance datasets. In the first project (Chapter 2), we propose a latent spatial piecewise exponential model for the misclassified disease surveillance data and apply the model to a data from the porcine reproductive and respiratory syndrome virus (PRRSV) disease. The misclassification of test outcomes are accounted for by using a two-level survival model. Spatial distance and time-varying covariates are incorporated to account for disease transmission. We show that our model is efficient in capturing the data features and easy to implement. In the second project (Chapter 3), we are motivated by parameter estimations in hidden Markov models (HMM) and mixed HMM (MHMM). The HMM can be applied to the animal disease surveillance data where the outcomes are with misclassification, and with a group level random effect added, the MHMM can model the correlation structure. However, the parameters estimation in these models are challenging because of the latent variables and random effect. We propose a pairwise fractional imputation using the idea of parametric fractional imputation as well as the Markov property. The proposed estimation method is shown to provide efficient parameter estimates and achieves computational efficiency. In the third project (Chapter 4), we further investigate into the piecewise exponential model and consider estimation of the hazard functions where a monotone restriction is put on the hazard. When observations are with misclassification, the estimation involves EM-algorithm and the principle of isotonic regression is used for constraint optimization of the model parameters. Details of the estimation algorithm is developed in this chapter and the bootstrap confidence interval is constructed for measuring the variability of the estimates. The proposed model is then applied to another PRRSV surveillance study in the swine population

    Customer lifetime value : an integrated data mining approach

    Full text link
    Customer Lifetime Value (CLV) ---which is a measure of the profit generating potential, or value, of a customer---is increasingly being considered a touchstone for customer relationship management. As the guide and benchmark for Customer Relationship Management (CRM) applications, CLV analysis has received increasing attention from both the marketing practitioners and researchers from different domains. Furthermore, the central challenge in predicting CLV is the precise calculation of customer’s length of service (LOS). There are several statistical approaches for this problem and several researchers have used these approaches to perform survival analysis in different domains. However, classical survival analysis techniques like Kaplan-Meier approach which offers a fully non-parametric estimate ignores the covariates completely and assumes stationary of churn behavior along time, which makes it less practical. Further, segments of customers, whose lifetimes and covariate effects can vary widely, are not necessarily easy to detect. Like many other applications, data mining is emerging as a compelling analysis tool for the CLV application recently. Comparatively, data mining methods offer an interesting alternative with the fact that they are less limited than the conventional statistical approaches. Customer databases contain histories of vital events such as the acquisition and cancellation of products and services. The historical data is used to build predictive models for customer retention, cross-selling, and other database marketing endeavors. In this research project we discuss and investigate the possibility of combining these statistical approaches with data mining methods to improve the performance for the CLV problem in a real business context. Part of the research effort is placed on the precise prediction of LOS of the customers in concentration of a real world business. Using the conventional statistical approaches and data mining methods in tandem, we demonstrate how data mining tools can be apt complements of the classical statistical models ---resulting in a CLV prediction model that is both accurate and understandable. We also evaluate the proposed integrated method to extract interesting business domain knowledge within the scope of CLV problem. In particular, several data mining methods are discussed and evaluated according to their accuracy of prediction and interpretability of results. The research findings will lead us to a data mining method combined with survival analysis approaches as a robust tool for modeling CLV and for assisting management decision-making. A calling plan strategy is designed based on the predicted survival time and calculated CLV for the telecommunication industry. The calling plan strategy further investigates potential business knowledge assisted by the CLV calculated

    Nonparametric and Semiparametric Analysis of Current Status Data Subject to Outcome Misclassification

    Get PDF
    In this article, we present nonparametric and semiparametric methods to analyze current status data subject to outcome misclassification. Our methods use nonparametric maximum likelihood estimation (NPMLE) to estimate the distribution function of the failure time when sensitivity and specificity may vary among subgroups. A nonparametric test is proposed for the two sample hypothesis testing. In regression analysis, we apply the Cox proportional hazard model and likelihood ratio based confidence intervals for the regression coefficients are proposed. Our methods are motivated and demonstrated by data collected from an infectious disease study in Seattle, WA

    Measurement Error and Misclassification in Interval-Censored Life History Data

    Get PDF
    In practice, data are frequently incomplete in one way or another. It can be a significant challenge to make valid inferences about the parameters of interest in this situation. In this thesis, three problems involving such data are addressed. The first two problems involve interval-censored life history data with mismeasured covariates. Data of this type are incomplete in two ways. First, the exact event times are unknown due to censoring. Second, the true covariate is missing for most, if not all, individuals. This work focuses primarily on the impact of covariate measurement error in progressive multi-state models with data arising from panel (i.e., interval-censored) observation. These types of problems arise frequently in clinical settings (e.g. when disease progression is of interest and patient information is collected during irregularly spaced clinic visits). Two and three state models are considered in this thesis. This work is motivated by a research program on psoriatic arthritis (PsA) where the effects of error-prone covariates on rates of disease progression are of interest and patient information is collected at clinic visits (Gladman et al. 1995; Bond et al. 2006). Information regarding the error distributions were available based on results from a separate study conducted to evaluate the reliability of clinical measurements that are used in PsA treatment and follow-up (Gladman et al. 2004). The asymptotic bias of covariate effects obtained ignoring error in covariates is investigated and shown to be substantial in some settings. In a series of simulation studies, the performance of corrected likelihood methods and methods based on a simulation-extrapolation (SIMEX) algorithm (Cook \& Stefanski 1994) were investigated to address covariate measurement error. The methods implemented were shown to result in much smaller empirical biases and empirical coverage probabilities which were closer to the nominal levels. The third problem considered involves an extreme case of interval censoring known as current status data. Current status data arise when individuals are observed only at a single point in time and it is then determined whether they have experienced the event of interest. To complicate matters, in the problem considered here, an unknown proportion of the population will never experience the event of interest. Again, this type of data is incomplete in two ways. One assessment is made on each individual to determine whether or not an event has occurred. Therefore, the exact event times are unknown for those who will eventually experience the event. In addition, whether or not the individuals will ever experience the event is unknown for those who have not experienced the event by the assessment time. This problem was motivated by a series of orthopedic trials looking at the effect of blood thinners in hip and knee replacement surgeries. These blood thinners can cause a negative serological response in some patients. This response was the outcome of interest and the only available information regarding it was the seroconversion time under current status observation. In this thesis, latent class models with parametric, nonparametric and piecewise constant forms of the seroconversion time distribution are described. They account for the fact that only a proportion of the population will experience the event of interest. Estimators based on an EM algorithm were evaluated via simulation and the orthopedic surgery data were analyzed based on this methodology

    Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration

    Get PDF
    Bayesian approaches for handling covariate measurement error are well established, and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration (RC), arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to RC, and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next we describe the closely related maximum likelihood and multiple imputation approaches, and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of RC and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey

    Bayesian semiparametric inference for multivariate doubly-interval-censored data

    Get PDF
    Based on a data set obtained in a dental longitudinal study, conducted in Flanders (Belgium), the joint time to caries distribution of permanent first molars was modeled as a function of covariates. This involves an analysis of multivariate continuous doubly-interval-censored data since: (i) the emergence time of a tooth and the time it experiences caries were recorded yearly, and (ii) events on teeth of the same child are dependent. To model the joint distribution of the emergence times and the times to caries, we propose a dependent Bayesian semiparametric model. A major feature of the proposed approach is that survival curves can be estimated without imposing assumptions such as proportional hazards, additive hazards, proportional odds or accelerated failure time.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS368 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Application of the Misclassification Simulation Extrapolation (Mc-Simex) Procedure to Log-Logistic Accelerated Failure Time (Aft) Models In Survival Analysis

    Get PDF
    Survival analysis is the study of time to event outcomes. Accelerated Failure Time models (AFT) serve as a useful tool in survival analysis to study the time of occurrence of an event and its relation to the covariates of interest. The accuracy of estimation of parameters in a model depends upon the correct measurement of covariates. Considering that perfect measurement of covariates is highly unlikely, it is imperative that the performance of the existing bias-correction methods be analyzed in AFT models. However, certain areas of bias-correction in AFT models still remain unexplored. One of these unexplored areas, is a situation where the survival times follow a log-logistic distribution. In this dissertation, we evaluate the performance of the Misclassification simulation extrapolation (MC-SIMEX) procedure, a well-known procedure for bias-correction due to misclassification, in AFT models where the survival times follow a standard log-logistic distribution. In addition, a modified version of the MC-SIMEX procedure is also proposed, that provides an advantage in situations where the sensitivity and specificity of classification are unknown. Lastly, the performance of the original MC-SIMEX procedure in lung cancer data provided by the North Central Cancer Treatment Group (NCCTG), is also evaluated

    Nonparametric inference for Markov processes with missing absorbing state

    Get PDF
    This study examines nonparametric estimations of a transition proba- bility matrix of a nonhomogeneous Markov process with a nite state space and a partially observed absorbing state. We impose a missing-at-random assumption and propose a computationally e cient nonparametric maximum pseudolikelihood estimator (NPMPLE). The estimator depends on a parametric model that is used to estimate the probability of each absorbing state for the missing observations based, potentially, on auxiliary data. For the latter model, we propose a formal goodness- of- t test based on a residual process. Using modern empirical process theory, we show that the estimator is uniformly consistent and converges weakly to a tight mean-zero Gaussian random eld. We also provide a methodology for constructing simultaneous con dence bands. Simulation studies show that the NPMPLE works well with small sample sizes and that it is robust against some degree of misspec- i cation of the parametric model for the missing absorbing states. The method is illustrated using HIV data from sub-Saharan Africa to estimate the transition probabilities of death and disengagement from HIV care
    • …
    corecore