74 research outputs found

    Estimating Discrete Markov Models From Various Incomplete Data Schemes

    Full text link
    The parameters of a discrete stationary Markov model are transition probabilities between states. Traditionally, data consist in sequences of observed states for a given number of individuals over the whole observation period. In such a case, the estimation of transition probabilities is straightforwardly made by counting one-step moves from a given state to another. In many real-life problems, however, the inference is much more difficult as state sequences are not fully observed, namely the state of each individual is known only for some given values of the time variable. A review of the problem is given, focusing on Monte Carlo Markov Chain (MCMC) algorithms to perform Bayesian inference and evaluate posterior distributions of the transition probabilities in this missing-data framework. Leaning on the dependence between the rows of the transition matrix, an adaptive MCMC mechanism accelerating the classical Metropolis-Hastings algorithm is then proposed and empirically studied.Comment: 26 pages - preprint accepted in 20th February 2012 for publication in Computational Statistics and Data Analysis (please cite the journal's paper

    Stochastic Decision Modeling to Improve Breast Cancer Preventive Care

    Get PDF
    Breast cancer is a leading cause of premature mortality among women in the United States. Breast cancer screening tests can help with detecting breast cancer in early stages and thereby reducing the breast cancer mortality risk. However, due to the imperfect nature of screening tests, there is always some associated overdiagnosis, false positives, and false negatives risks. Therefore, to improve breast cancer preventive care, we defined the focus of this dissertation on modeling breast cancer screening decisions.Breast cancer overdiagnosis is the first issue that is addressed in this dissertation. Although overdiagnosis is known to be the major risk inherent in mammography screening; currently there is no way to distinguish between overdiagnosed cancers and the ones that would cause problems over a patient’s lifetime. Overdiagnosis risk significantly depends on a patient’s compliance with screening recommendations. In Chapter 2, we use a stochastic framework to perform a harm-benefit analysis to compare the overdiagnosis risk with the benefits that breast cancer screening provides. In addition, we estimate the lifetime mortality risk of breast cancer while considering the overdiagnosis risk and the uncertainty in a patient’s adherence behavior. Our results show that, although overdiagnosis rate is relatively high in breast cancer screening, the benefits of breast cancer mammography screening outweigh the overdiagnosis risk.The second issue that is addressed in this dissertation is false negative results caused by density of breast tissue. Breast density is known to increase breast cancer risk and decrease mammography screening sensitivity. Breast density notification laws, require physicians to inform women with high breast density of these potential risks. The laws usually require healthcare providers to notify patients of the possibility of using more sensitive supplemental screening tests (e.g., ultrasound). Since the enactment of the laws, there have been controversial debates over i) their implementations due to the potential radiologists bias in breast density classification of mammogram images and ii) the necessity of supplemental screenings for all patients with high breast density. Breast density is a dynamic risk factor. Therefore, in the third chapter, we apply a hidden Markov model (HMM) on a sparse unbalanced longitudinal data to quantify the yearly progression of breast density based on Breast Imaging Reporting and Data System (BI-RADs) classifications.In Chapter 4, we use the results from previous chapter to investigate the effectiveness of supplemental screening and the impact of radiologists’ bias on patients’ outcomes under the breast density notification law. We consider the conditional probability of eventually detecting breast cancer in early states given that the patient develops breast cancer in her lifetime and the expected number of supplemental tests as patient’s outcome. Our results indicate that referring patients to a supplemental test solely based on their breast density may not necessarily improve their health outcomes and other risk factors need to be considered when making such referrals. Additionally, average-skilled radiologists’ performances are shown to be comparable with the performance of a perfect radiologist

    Simultaneous evaluation of abstinence and relapse using a Markov chain model in smokers enrolled in a two-year randomized trial

    Get PDF
    Abstract Background GEE and mixed models are powerful tools to compare treatment effects in longitudinal smoking cessation trials. However, they are not capable of assessing the relapse (from abstinent back to smoking) simultaneously with cessation, which can be studied by transition models. Methods We apply a first-order Markov chain model to analyze the transition of smoking status measured every 6 months in a 2-year randomized smoking cessation trial, and to identify what factors are associated with the transition from smoking to abstinent and from abstinent to smoking. Missing values due to non-response are assumed non-ignorable and handled by the selection modeling approach. Results Smokers receiving high-intensity disease management (HDM), of male gender, lower daily cigarette consumption, higher motivation and confidence to quit, and having serious attempts to quit were more likely to become abstinent (OR = 1.48, 1.66, 1.03, 1.15, 1.09 and 1.34, respectively) in the next 6 months. Among those who were abstinent, lower income and stronger nicotine dependence (OR = 1.72 for ≤ vs. > 40 K and OR = 1.75 for first cigarette ≤ vs. > 5 min) were more likely to have relapse in the next 6 months. Conclusions Markov chain models allow investigation of dynamic smoking-abstinence behavior and suggest that relapse is influenced by different factors than cessation. The knowledge of treatments and covariates in transitions in both directions may provide guidance for designing more effective interventions on smoking cessation and relapse prevention. Trial Registration clinicaltrials.gov identifier: NCT00440115Peer Reviewe

    The existence and persistence of household financial hardship

    Get PDF
    We investigate the existence and persistence of financial hardship at the household level using data from the British Household Panel Survey. Our modelling strategy makes three important contributions to the existing literature on household finances. Firstly, we model nine different types of household financial problems within a joint framework, allowing for correlation in the random effects across the nine equations. Secondly, we develop a dynamic framework in order to model the persistence of financial problems over time by extending our multi-equation framework to allow the presence or otherwise of different types of financial problems in the previous time period to influence the probability that the household currently experiences such problems. Our third contribution relates to the possibility that experiencing financial problems may be correlated with sample attrition. We model missing observations in the panel in order to allow for such attrition. Our findings reveal interesting variations in the determinants of experiencing different types of financial problems including demographic and regional differences. Our findings also highlight persistence in experiencing financial problems over time as well as the role that saving on a regular basis in previous time periods can play in mitigating current financial problems

    Statistical Methods for Non-Ignorable Missing Data With Applications to Quality-of-Life Data.

    Get PDF
    Researchers increasingly use more and more survey studies, and design medical studies to better understand the relationships of patients, physicians, their health care system utilization, and their decision making processes in disease prevention and management. Longitudinal data is widely used to capture trends occurring over time. Each subject is observed as time progresses, but a common problem is that repeated measurements are not fully observed due to missing response or loss to follow up. An individual can move in and out of the observed data set during a study, giving rise to a large class of distinct non-monotone missingness patterns. In such medical studies, sample sizes are often limited due to restrictions on disease type, study design and medical information availability. Small sample sizes with large proportions of missing information are problematic for researchers trying to understand the experience of the total population. The information in the data collected may produce biased estimators if, for example, the patients who don\u27t respond have worse outcomes, or the patients who answered unknown are those without access to medical or non-medical information or care. Data modeled without considering this missing information may cause biased results. A first-order Markov dependence structure is a natural data structure to model the tendency of changes. In my first project, we developed a Markov transition model using a full-likelihood based algorithm to provide robust estimation accounting for non-ignorable\u27\u27 missingness information, and applied it to data from the Penn Center of Excellence in Cancer Communication Research. In my second project, we extended the method to a pseudo-likelihood based approach by considering only pairs of adjacent observations to significantly ease the computational complexities of the full-likelihood based method proposed in the first project. In my third project, we proposed a two stage pseudo hidden Markov model to analyze the association between quality of life measurements and cancer treatments from a randomized phase III trial (RTOG 9402) in brain cancer patients. By incorporating selection models and shared parameter models with a hidden Markov model, this approach provides targeted identification of treatment effects

    A multilevel latent Markov model for the evaluation of nursing homes' performance

    Get PDF
    The periodic evaluation of health care services is a primary concern for many institutions. In this work, we focus on nursing home services with the aim to produce a ranking of a set of nursing homes based on their capability to improve - or at least to keep unchanged - the health status of the patients they host. As the overall health status is not directly observable, latent variable models represent a suitable approach. Moreover, given the longitudinal and multilevel structure of the available data, we rely on a multilevel latent Markov model where patients and nursing homes are the first and the second level units, respectively. The model includes individual covariates to account for the patient case-mix and the impact of nursing home membership is modeled through a pair of correlated random effects affecting the initial distribution and the transition probabilities between different levels of health status. Through the prediction of these random effects we obtain a ranking of the nursing homes. Furthermore, the proposed model is designed to address non-ignorable dropout, which typically occurs in these contexts because some elderly patients die before completing the survey. We apply our model to the Long Term Care Facilities dataset, a longitudinal dataset gathered from Regione Umbria (Italy). Our results are robust to the sensitivity parameter involved (the number of latent states) and show that differences in nursing homes' performances are statistically significant. The authors certify that they have the right to deposit this contribution in its published format with MPRA

    Statistical methods for handling incomplete longitudinal data with emphasis on discrete outcomes with application.

    Get PDF
    Doctor of Philosophy in Statistics. University of KwaZulu-Natal, Pietermaritzburg 2017.In longitudinal studies, measurements are taken repeatedly over time on the same ex- perimental unit. These measurements are thus correlated. The variances in repeated measures change with respect to time. Therefore, the variations together with the po- tential correlation patterns produce a complicated variance structure for the measures. Standard regression and analysis of variance techniques may result into invalid inference because they entail some mathematical assumptions that do not hold for repeated mea- sures data. Coupled with the repeated nature of the measurements, these datasets are often imbal- anced due to missing data. Methods used should be capable of handling the incomplete nature of the data, with the ability to capture the reasons for missingness in the analysis. This thesis seeks to investigate and compare analysis methods for incomplete correlated data, with primary emphasis on discrete longitudinal data. The thesis adopts the general taxonomy of longitudinal models, including marginal, random e ects, and transitional models. Although the objective is to deal with discrete data, the thesis starts with one continu- ous data case. Chapter 2 presents a comparative analysis on how to handle longitudinal continuous outcomes with dropouts missing at random. Inverse probability weighted generalized estimating equations (GEEs) and multiple imputation (MI) are compared. In Chapter 3, the weighted GEE is compared to GEE after MI (MI-GEE) in the analy- sis of correlated count outcome data in a simulation study. Chapter 4 deals with MI in the handling of ordinal longitudinal data with dropouts on the outcome. MI strategies, namely multivariate normal imputation (MNI) and fully conditional speci cation (FCS) are compared both in a simulation study and a real data application. In Chapter 5, still focussing on ordinal outcomes, the thesis presents a simulation and real data ap- plication to compare complete case analysis with advanced methods; direct likelihood analysis, MNI, FCS and ordinal imputation method. Finally, in Chapter 6, cumulative logit ordinal transition models are utilized to investigate the inuence of dependency of current incomplete responses on past responses. Transitions from one response state to another over time are of interest

    Bayesian nonparametric models for biomedical data analysis

    Get PDF
    In this dissertation, we develop nonparametric Bayesian models for biomedical data analysis. In particular, we focus on inference for tumor heterogeneity and inference for missing data. First, we present a Bayesian feature allocation model for tumor subclone reconstruction using mutation pairs. The key innovation lies in the use of short reads mapped to pairs of proximal single nucleotide variants (SNVs). In contrast, most existing methods use only marginal reads for unpaired SNVs. In the same context of using mutation pairs, in order to recover the phylogenetic relationship of subclones, we then develop a Bayesian treed feature allocation model. In contrast to commonly used feature allocation models, we allow the latent features to be dependent, using a tree structure to introduce dependence. Finally, we propose a nonparametric Bayesian approach to monotone missing data in longitudinal studies with non-ignorable missingness. In contrast to most existing methods, our method allow for incorporating information from auxiliary covariates and is able to capture complex structures among the response, missingness and auxiliary covariates. Our models are validated through simulation studies and are applied to real-world biomedical datasets.Statistic

    Analysis of multivariate longitudinal categorical data subject to nonrandom missingness: a latent variable approach

    Get PDF
    Longitudinal data are collected for studying changes across time. In social sciences, interest is often in theoretical constructs, such as attitudes, behaviour or abilities, which cannot be directly measured. In that case, multiple related manifest (observed) variables, for example survey questions or items in an ability test, are used as indicators for the constructs, which are themselves treated as latent (unobserved) variables. In this thesis, multivariate longitudinal data is considered where multiple observed variables, measured at each time point, are used as indicators for theoretical constructs (latent variables) of interest. The observed items and the latent variables are linked together via statistical latent variable models. A common problem in longitudinal studies is missing data, where missingness can be classiffed into one of two forms. Dropout occurs when subjects exit the study prematurely, while intermittent missingness takes place when subjects miss one or more occasions but show up on a subsequent wave of the study. Ignoring the missingness mechanism can lead to biased estimates, especially when the missingness is nonrandom. The approach proposed in this thesis uses latent variable models to capture the evolution of a latent phenomenon over time, while incorporating a missingness mechanism to account for possibly nonrandom forms of missingness. Two model specifications are presented, the first of which incorporates dropout only in the missingness mechanism, while the other accounts for both dropout and intermittent missingness allowing them to be informative by being modelled as functions of the latent variables and possibly observed covariates. Models developed in this thesis consider ordinal and binary observed items, because such variables are often met in social surveys, while the underlying latent variables are assumed to be continuous. The proposed models are illustrated by analysing people's perceptions on women's work using three questions from five waves of the British Household Panel Surve
    • …
    corecore