83 research outputs found

    Casual Inference using Deep Bayesian Dynamic Survival Model (CDS)

    Full text link
    Causal inference in longitudinal observational health data often requires the accurate estimation of treatment effects on time-to-event outcomes in the presence of time-varying covariates. To tackle this sequential treatment effect estimation problem, we have developed a causal dynamic survival (CDS) model that uses the potential outcomes framework with the recurrent sub-networks with random seed ensembles to estimate the difference in survival curves of its confidence interval. Using simulated survival datasets, the CDS model has shown good causal effect estimation performance across scenarios of sample dimension, event rate, confounding and overlapping. However, increasing the sample size is not effective to alleviate the adverse impact from high level of confounding. In two large clinical cohort studies, our model identified the expected conditional average treatment effect and detected individual effect heterogeneity over time and patient subgroups. CDS provides individualised absolute treatment effect estimations to improve clinical decisions

    Mixed-effects models for health care longitudinal data with an informative visiting process: A Monte Carlo simulation study.

    Get PDF
    Electronic health records are being increasingly used in medical research to answer more relevant and detailed clinical questions; however, they pose new and significant methodological challenges. For instance, observation times are likely correlated with the underlying disease severity: Patients with worse conditions utilise health care more and may have worse biomarker values recorded. Traditional methods for analysing longitudinal data assume independence between observation times and disease severity; yet, with health care data, such assumptions unlikely hold. Through Monte Carlo simulation, we compare different analytical approaches proposed to account for an informative visiting process to assess whether they lead to unbiased results. Furthermore, we formalise a joint model for the observation process and the longitudinal outcome within an extended joint modelling framework. We illustrate our results using data from a pragmatic trial on enhanced care for individuals with chronic kidney disease, and we introduce user-friendly software that can be used to fit the joint model for the observation process and a longitudinal outcome

    Assessing the accuracy of predictive models with interval-censored data

    Get PDF
    This is a pre-copyedited, author-produced PDF of an article accepted for publication in Biostatistics following peer review. The version records “Wu Y and Cook RJ (2022), Assessing the accuracy of predictive models with interval-censored data, Biostatistics, 23 (1): 18–33”. DOI: 10.1093/biostatistics/kxaa011 is available online at: https://doi.org/10.1093/biostatistics/kxaa011.We develop methods for assessing the predictive accuracy of a given event time model when the validation sample is comprised of case K interval-censored data. An imputation-based, an inverse probability weighted (IPW), and an augmented inverse probability weighted (AIPW) estimator are developed and evaluated for the mean prediction error and the area under the receiver operating characteristic curve when the goal is to predict event status at a landmark time. The weights used for the IPW and AIPW estimators are obtained by fitting a multistate model which jointly considers the event process, the recurrent assessment process, and loss to follow-up. We empirically investigate the performance of the proposed methods and illustrate their application in the context of a motivating rheumatology study in which human leukocyte antigen markers are used to predict disease progression status in patients with psoriatic arthritis.National Natural Science Foundation of China, Grant 11701295 (to YW) || Discovery Grants from the Natural Science and Engineering Research Council of Canada, RGPIN 155849 (to RJC) || Canadian Institutes of Health Research, FRN 13887 (to RJC

    Feedback Effect in User Interaction with Intelligent Assistants: Delayed Engagement, Adaption and Drop-out

    Full text link
    With the growing popularity of intelligent assistants (IAs), evaluating IA quality becomes an increasingly active field of research. This paper identifies and quantifies the feedback effect, a novel component in IA-user interactions: how the capabilities and limitations of the IA influence user behavior over time. First, we demonstrate that unhelpful responses from the IA cause users to delay or reduce subsequent interactions in the short term via an observational study. Next, we expand the time horizon to examine behavior changes and show that as users discover the limitations of the IA's understanding and functional capabilities, they learn to adjust the scope and wording of their requests to increase the likelihood of receiving a helpful response from the IA. Our findings highlight the impact of the feedback effect at both the micro and meso levels. We further discuss its macro-level consequences: unsatisfactory interactions continuously reduce the likelihood and diversity of future user engagements in a feedback loop.Comment: PAKDD 202

    Modeling and Prediction of Disease Processes Subject to Intermittent Observation

    Get PDF
    This thesis is concerned with statistical modeling and prediction of disease processes subject to intermittent observation. Times of disease progression are interval-censored when progression status is only known at a series of assessment times. This situation arises routinely in clinical trials and cohort studies when events of interest are only detectable upon imaging, based on blood tests, or upon careful clinical examination. The work that follows is motivated by the study of demographic, genetic and clinical data available from the University of Toronto Psoriasis Registry and the University of Toronto Psoriatic Arthritis Registry, each involving cohorts of several hundred patients with the respective diseases. Chapter 2 deals with the problem of selecting important prognostic biomarkers from a large set of candidates biomarkers when the status with respect to an event of interest (e.g. disease progression) is only known at irregularly spaced and individual-specific assessment times. Penalized regression techniques (e.g. LASSO, adaptive LASSO and SCAD) are adapted to deal with the interval-censored event times arising from this observation scheme. An expectation-maximization algorithm is developed which is demonstrated to perform well in extensive simulation studies involving independent and correlated continuous and binary covariates. Application to the motivating study of the development of arthritis mutilans in patients with psoriatic arthritis is given and several important human leukocyte antigen (HLA) variables are identified for further investigation. Extensions of this algorithm are developed for settings in which data from different sources with distinct disease-related entry conditions are to be synthesized. The extended Turnbull-type expectation-maximization algorithm is based on a complete data likelihood which incorporates missing information from individuals not meeting the entry criteria of the respective registries. Simulation studies demonstrate good empirical performance and an application to the motivating study identifies HLA markers associated with the onset of psoriatic arthritis among individuals with psoriasis. This analysis is carried out using data from a psoriasis registry in which the times to psoriatic arthritis are left-truncated, and psoriatic arthritis registry in which the onset times are right-truncated. Chapter 3 deals with the challenge of assessing the accuracy of a predictive model when response times are interval-censored. Inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimators of predictive accuracy are developed and evaluated based on the mean prediction error and the area under the receiver operating characteristic curve. The weights are estimated from a multistate model which jointly considers the event process, the inspection process, and the right-censoring processes. We investigate the performance of the proposed methods by simulation and illustrate their application in the context of a motivating rheumatology study in which HLA markers are used for predicting disease progression in psoriatic arthritis. A two-phase model is developed in Chapter 4 for chronic diseases which feature an indolent phase followed by a phase with more active disease resulting in progression and damage. The time-scales for the intensity functions for the active phase are more naturally based on the time since the start of the active phase, corresponding to a semi-Markov formulation. In cohort studies for which the disease status is only known at a series of clinical assessment times, transition times are interval-censored which means the time origin for phase II is interval-censored. Weakly parametric models with piecewise constant baseline hazard and rate functions are specified and an expectation-maximization algorithm is described for model fitting. A computationally faster two-stage estimation procedure is also developed and the asymptotic variances of the resulting estimators are derived. Simulation studies examining the performance of the proposed model show good performance under both maximum likelihood and two-stage estimation. An application to data from the motivating study of disease progression in psoriatic arthritis illustrates the procedure, and identifies new human leukocyte antigens associated with the duration of the indolent phase, and others associated with disease progression in the active phase. Open problems and topics for ongoing and future research are discussed in Chapter 5

    A Deep Recurrent Survival Model for Unbiased Ranking

    Get PDF
    Position bias is a critical problem in information retrieval when dealing with implicit yet biased user feedback data. Unbiased ranking methods typically rely on causality models and debias the user feedback through inverse propensity weighting. While practical, these methods still suffer from two major problems. First, when infer a user click, the impact of the contextual information, such as documents that have been examined, is often ignored. Second, only the position bias is considered but other issues resulted from user browsing behaviors are overlooked. In this paper, we propose an end-to-end Deep Recurrent Survival Ranking (DRSR), a unified framework to jointly model user's various behaviors, to (i) consider the rich contextual information in the ranking list; and (ii) address the hidden issues underlying user behaviors, i.e., to mine observe pattern in queries without any click (non-click queries), and to model tracking logs which cannot truly reflect the user browsing intents (untrusted observation). Specifically, we adopt a recurrent neural network to model the contextual information and estimates the conditional likelihood of user feedback at each position. We then incorporate survival analysis techniques with the probability chain rule to mathematically recover the unbiased joint probability of one user's various behaviors. DRSR can be easily incorporated with both point-wise and pair-wise learning objectives. The extensive experiments over two large-scale industrial datasets demonstrate the significant performance gains of our model comparing with the state-of-the-arts

    Strongly Constrained Discrete Hashing

    Get PDF
    Learning to hash is a fundamental technique widely used in large-scale image retrieval. Most existing methods for learning to hash address the involved discrete optimization problem by the continuous relaxation of the binary constraint, which usually leads to large quantization errors and consequently suboptimal binary codes. A few discrete hashing methods have emerged recently. However, they either completely ignore some useful constraints (specifically the balance and decorrelation of hash bits) or just turn those constraints into regularizers that would make the optimization easier but less accurate. In this paper, we propose a novel supervised hashing method named Strongly Constrained Discrete Hashing (SCDH) which overcomes such limitations. It can learn the binary codes for all examples in the training set, and meanwhile obtain a hash function for unseen samples with the above mentioned constraints preserved. Although the model of SCDH is fairly sophisticated, we are able to find closed-form solutions to all of its optimization subproblems and thus design an efficient algorithm that converges quickly. In addition, we extend SCDH to a kernelized version SCDH K . Our experiments on three large benchmark datasets have demonstrated that not only can SCDH and SCDH K achieve substantially higher MAP scores than state-of-the-art baselines, but they train much faster than those that are also supervised as well

    Life History Analysis with Response-Dependent Observation

    Get PDF
    This thesis deals with statistical issues in the analysis of dependent failure time data under complex observation schemes. These observation schemes may yield right-censored, interval-censored and current status data and may also involve response-dependent selection of individuals. The contexts in which these complications arise include family studies, clinical trials, and population studies. Chapter 2 is devoted to the development and study of statistical methods for family studies, motivated by work conducted in the Centre for Prognosis Studies in the Rheumatic Disease at the University of Toronto. Rheumatologists at this centre are interested in studying the nature of within-family dependence in the occurrence of psoriatic arthritis (PsA) to gain insight into the genetic basis for this disease. Families are sampled by selecting members from a clinical registry of PsA patients maintained at the centre and recruiting their respective consenting family members; the member of the registry leading to the sampling of the family is called the proband. Information on the disease onset time for non-probands may be collected by recall or a review of medical records, but some non-probands simply provide their disease status at the time of assessment. As a result family members may provide a combination of observed or right-censored onset times, and current status information. Gaussian copula-based models are studied as a means of flexibly characterizing the within-family association in disease onset times. Likelihood and composite likelihood procedures are also investigated where the latter, like the estimating function approach, reduces the need to specify high-order dependencies and computational burden. Valid analysis of this type of data must address the response-biased sampling scheme which renders at least one affected family member (proband) with a right-truncated onset time. This right-truncation scheme, combined with the low incidence of disease among non-probands, means there is little information about the marginal onset time distribution from the family data alone, so we exploit auxiliary data from an independent sample of independent individuals to enhance the information on the parameters in the marginal age of onset distribution. For composite likelihood approaches, we consider simultaneous and two-stage estimation procedures; the latter greatly simplified the computational burden, especially when weakly, semi- or non-parametric marginal models are adopted. The proposed models and methods are examined in simulation studies and are applied to data from the PsA family study yielding important insight regarding the parent of origin hypothesis. Cluster-randomized trials are employed when it is appropriate on ethical, practical, or contextual grounds to assign groups of individuals to receive one of two or more interventions to be compared. This design also offers a way of minimizing contamination across treatment groups and enhancing compliance. Although considerable attention has been directed at the development of sample size formulae for cluster-randomized trials with continuous or discrete outcomes, relatively little work has been done for trials involving censored event times. In Chapter 3, asymptotic theory for sample size calculations for correlated failure time data arising in cluster-randomized trials is explored. When the intervention effect is specified through a semi-parametric proportional hazards model fitted under a working independence assumption, robust variance estimates are routinely used. At the design stage however, some model specification is required for the marginal distributions, and copula models are utilized to accommodate the within-cluster dependence. This method is appealing since the intervention effects are specified in terms of the marginal proportional hazards formulation while the within-cluster dependence is modeled by a separate association parameter. The resulting joint model enabled one to evaluate the robust sandwich variance, based on which the sample size criteria for right censored event times is developed. This approach has also been extended to deal with interval-censored event times and within-cluster dependence in the random right censoring times. The validity of the sample size formula in finite samples was investigated via simulation for a range of cluster sizes, censoring rates and degree of within-cluster association among event times. The power and efficiency implications of copula misspecification are studied, along with the effect of within-cluster dependence in the censoring times. The proposed sample size formula can be applied in a broad range of practical settings, and an application to a study of otitis media is given for illustration. Chapter 4 considers dependent failure time data in a slightly different context where the events correspond to transitions in a multistate model. A central goal in oncology is the reduction of mortality due to cancer. The therapeutic advances in the treatment of many cancers and the increasing pressure to ensure experimental treatments are evaluated in a timely and cost-effective manner, have made it challenging to design feasible trials with adequate power to detect clinically important effects based on the time from randomization to death. This has lead to increased use of the composite endpoint of progression-free survival, defined as the time from randomization to the first of progression or death. While trials may be designed with progression or progression-free survival as the primary endpoint, regulators are interested in statements about the effect of treatment on survival following progression. One approach to investigate this is to estimate the treatment effect on the time from progression to death, but this is not an analysis that benefits from randomization since the only individuals who contribute to this analysis are those that experienced progression. Also assessing the treatment effect on marginal features might lead to dependent censoring for the survival time following progression as other variables which have both effect on progression and post-progression survival time are omitted from the model. In Chapter 4 we consider a classical illness-death model which can be used to characterize the joint distribution of progression and death in this setting. Inverse probability weighting can then be used to address for the observational nature of this improper sub-group analysis and dependent censoring. Such inverse weighted equations yield consistent estimates of the causal treatment effect by accounting for the effect of treatment and any prognostic factors that may be shared between the model for the sojourn time distribution in the progression state and the transition intensity for progression. Due to the non-collapsibility of the Cox regression model we focus here on additive regression models. Chapter 5 discusses prevalent cohort studies and the problem of measurement error in the reported disease onset time along with other topics for further research

    Event History Analysis in Longitudinal Cohort Studies with Intermittent Inspection Times

    Get PDF
    Event history studies based on disease clinic data often face several complications. Specifically, patients visit the clinic irregularly, and the intermittent inspection times depend on the history of disease-related variables; this can cause event or failure times to be dependently interval-censored. Furthermore, failure times could be truncated, treatment assignment is non-randomized and can be confounded, and there are competing risks of the failure time outcomes under study. I propose a class of inverse probability weights applied to estimating functions so that the informative inspection scheme and confounded treatment are appropriately dealt with. As a result, the distribution of failure time outcomes can be consistently estimated. I consider parametric, non- and semi-parametric estimation. Monotone smoothing techniques are employed in a two-stage estimation procedure for the non- or semi-parametric estimation. Simulations for a variety of failure time models are conducted for examining the finite sample performances of proposed estimators. This research is initially motivated by the Psoriatic Arthritis (PsA) Toronto Cohort Study at the Toronto Western Hospital and the proposed methodologies are applied to this cohort study as an illustration
    corecore