9 research outputs found

    On near-redundancy and identifiability of parametric hazard regression models under censoring

    Get PDF
    We study parametric inference on a rich class of hazard regression models in the presence of right-censoring. Previous literature has reported some inferential challenges, such as multimodal or flat likelihood surfaces, in this class of models for some particular data sets. We formalize the study of these inferential problems by linking them to the concepts of near-redundancy and practical nonidentifiability of parameters. We show that the maximum likelihood estimators of the parameters in this class of models are consistent and asymptotically normal. Thus, the inferential problems in this class of models are related to the finite-sample scenario, where it is difficult to distinguish between the fitted model and a nested nonidentifiable (i.e., parameter-redundant) model. We propose a method for detecting near-redundancy, based on distances between probability distributions. We also employ methods used in other areas for detecting practical nonidentifiability and near-redundancy, including the inspection of the profile likelihood function and the Hessian method. For cases where inferential problems are detected, we discuss alternatives such as using model selection tools to identify simpler models that do not exhibit these inferential problems, increasing the sample size, or extending the follow-up time. We illustrate the performance of the proposed methods through a simulation study. Our simulation study reveals a link between the presence of near-redundancy and practical nonidentifiability. Two illustrative applications using real data, with and without inferential problems, are presented

    Analysis of Interval Censored Data Using a Longitudinal Biomarker

    Get PDF
    In many medical studies, interest focuses on studying the effects of potential risk factors on some disease events, where the occurrence time of disease events may be defined in terms of the behavior of a biomarker. For example, in diabetic studies, diabetes is defined in terms of fasting plasma glucose being 126 mg/dl or higher. In practice, several issues complicate determining the exact time-to-disease occurrence. First, due to discrete study follow-up times, the exact time when a biomarker crosses a given threshold is unobservable, yielding so-called interval censored events. Second, most biomarker values are subject to measurement error due to imperfect technologies, so the observed biomarker values may not reflect the actual underlying biomarker levels. Third, using a common threshold for defining a disease event may not be appropriate due to patient heterogeneity. Finally, informative diagnosis and subsequent treatment outside of observational studies may alter observations after the diagnosis. It is well known that the complete case analysis excluding the externally diagnosed subjects can be biased when diagnosis does not occur completely at random. To resolve these four issues, we consider a semiparametric model for analyzing threshold-dependent time-to-event defined by extreme-value-distributed biomarkers. First, we propose a semiparametric marginal model based on a generalized extreme value distribution. By assuming the latent error-free biomarkers to be non-decreasing, the proposed model implies a class of proportional hazards models for the time-to-event defined for any given threshold value. Second, we extend the marginal likelihood to a pseudo-likelihood by multiplying the likelihoods over all observation times. Finally, to adjust for externally diagnosed cases, we consider a weighted pseudo-likelihood estimator by incorporating inverse probability weights into the pseudo-likelihood by assuming that external diagnosis depends on observed data rather than unobserved data. We estimate the three model parameters using the nonparametric EM, pseudo-EM and weighted-pseudo-EM algorithm, respectively. Herein, we theoretically investigate the models and estimation methods. We provide a series of simulations, to test each model and estimation method, comparing them against alternatives. Consistency, convergence rates, and asymptotic distributions of estimators are investigated using empirical process techniques. To show a practical implementation, we use each model to investigate data from the ARIC study and the diabetes ancillary study of the ARIC study.Doctor of Philosoph

    Change-point Problem and Regression: An Annotated Bibliography

    Get PDF
    The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder . The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis. Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem. Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression. The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression. The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
    corecore