1,073 research outputs found

    Comparing Trends in Cancer Rates Across Overlapping Regions

    Get PDF
    Monitoring and comparing trends in cancer rates across geographic regions or over different time periods has been one main task of the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program as it profiles health care quality as well as decides health care resource allocations within a spatial-temporal framework. A fundamental difficulty, however, arises when such comparisons have to be made for regions or time intervals that overlap, e.g. comparing the change in trends of mortality rates in a local area (e.g. the mortality rate of Breast Cancer in California) with a more global level (i.e. the national mortality rate of Breast Cancer). In view of sparsity of available methodologies, this paper develops a simple corrected Z-test that accounts for such overlapping. The performance of the proposed test over the two-sample “pooled” t-test that assumes independence across comparison groups is assessed via the Pitman asymptotic relative efficiency as well as Monte Carlo simulations and applications to the SEER cancer data. The proposed test will be important for the SEER*STAT software, maintained by the NCI, for the analysis of the SEER data

    Bayes Estimation of a Distribution Function Using Ranked Set Samples

    Get PDF
    Aranked set sample (RSS), if not balanced, is simply a sample of independent order statistics generated from the same underlying distribution F. Kvam and Samaniego (1994) derived maximum likelihood estimates of F for a general RSS. In many applications, including some in the environmental sciences, prior information about F is available to supplement the data-based inference. In such cases, Bayes estimators should be considered for improved estimation. Bayes estimation (using the squared error loss function) of the unknown distribution function F is investigated with such samples. Additionally, the Bayes generalized maximum likelihood estimator (GMLE) is derived. An iterative scheme based on the EM Algorithm is used to produce the GMLE of F. For the case of squared error loss, simple solutions are uncommon, and a procedure to find the solution to the Bayes estimate using the Gibbs sampler is illustrated. The methods are illustrated with data from the Natural Environmental Research Council of Great Britain (1975), representing water discharge of floods on the Nidd River in Yorkshire, Englan

    Ranked set sampling from location-scale families of symmetric distributions

    Get PDF
    Statistical inference based on ranked set sampling has primarily been motivated by nonparametric problems. However, the sampling procedure can provide an improved estimator of the population mean when the population is partially known. In this article, we consider estimation of the population mean and variance for the location-scale families of distributions. We derive and compare different unbiased estimators of these parameters based on independent replications of a ranked set sample of size n. Large sample properties, along with asymptotic relative efficiencies, help identify which estimators are best suited for different location-scale distributions

    Mixture Cure Survival Models with Dependent Censoring

    Get PDF
    A number of authors have studies the mixture survival model to analyze survival data with nonnegligible cure fractions. A key assumption made by these authors is the independence between the survival time and the censoring time. To our knowledge, no one has studies the mixture cure model in the presence of dependent censoring. To account for such dependence, we propose a more general cure model which allows for dependent censoring. In particular, we derive the cure models from the perspective of competing risks and model the dependence between the censoring time and the survival time using a class of Archimedean copula models. Within this framework, we consider the parameter estimation, the cure detection, and the two-sample comparison of latency distribution in the presence of dependent censoring when a proportion of patients is deemed cured. Large sample results using the martingale theory are obtained. We applied the proposed methodologies to the SEER prostate cancer data

    Nonparametric Estimation of the Survival Function Based on Censored Data with Additional Observations from the Residual Distribution

    Get PDF
    We derive the nonparametric maximum likelihood estimator (NPMLE) of the distribution of the test items using a random, right-censored sample combined with an additional right-censored, residual-lifetime sample in which only lifetimes past a known, fixed time are collected. This framework is suited for samples for which individual test data are combined with left-truncated and randomly censored data from an operating environment. The NPMLE of the survival function using the combined sample is identical to the Kaplan-Meier product-limit estimator only up to the time at which the test items corresponding to the residual sample were known to survive. The limiting distribution for the NPMLE, discussed in detail, leads to confidence bounds for the survival function. For the uncensored case, we study the relative efficiency for the estimator based on the combined sample with respect to the analogous estimator based only on the simple random sample

    Survival Analysis with Change Point Hazard Functions

    Get PDF

    A modified version of Moran's I

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Investigation of global clustering patterns across regions is very important in spatial data analysis. Moran's <it>I </it>is a widely used spatial statistic for detecting global spatial patterns such as an east-west trend or an unusually large cluster. Here, we intend to improve Moran's <it>I </it>for evaluating global clustering patterns by including the weight function in the variance, introducing a population density (PD) weight function in the statistics, and conducting Monte Carlo simulation for testing. We compare our modified Moran's <it>I </it>with Oden's <it>I</it>*<sub><it>pop </it></sub>for simulated data with homogeneous populations. The proposed method is applied to a census tract data set.</p> <p>Methods</p> <p>We present a modified version of Moran's <it>I </it>which includes information about the strength of the neighboring association when estimating the variance for the statistic. We provide a power analysis on Moran's <it>I</it>, a modified version of Moran's <it>I</it>, and <it>I</it>*<sub><it>pop </it></sub>in a simulation study. Data were simulated under two common spatial correlation scenarios of local and global clustering.</p> <p>Results</p> <p>For simulated data with a large cluster pattern, the modified Moran's <it>I </it>has the highest power (43.4%) compared to Moran's <it>I </it>(39.9%) and <it>I</it>*<sub><it>pop </it></sub>(12.4%) when the adjacent weight function is used with 5%, 10%, 15%, 20%, or 30% of the total population as the geographic range for the cluster.</p> <p>For two global clustering patterns, the modified Moran's <it>I </it>(power > 25.3%) performed better than both Moran's <it>I </it>(> 24.6%) and <it>I</it>*<sub><it>pop </it></sub>(> 7.9%) with the adjacent weight function. With the population density weight function, all methods performed equally well.</p> <p>In the real data example, all statistics indicate the existence of a global clustering pattern in a leukemia data set. The modified Moran's <it>I </it>has the lowest p-value (.0014) followed by Moran's <it>I </it>(.0156) and <it>I</it>*<sub><it>pop </it></sub>(.011).</p> <p>Conclusions</p> <p>Our power analysis and simulation study show that the modified Moran's <it>I </it>achieved higher power than Moran's <it>I </it>and <it>I</it>*<sub><it>pop </it></sub>for evaluating global and local clustering patterns on geographic data with homogeneous populations. The inclusion of the PD weight function which in turn redefines the neighbors seems to have a large impact on the power of detecting global clustering patterns. Our methods to improve the original version of Moran's <it>I </it>for homogeneous populations can also be extended to some alternative versions of Moran's <it>I </it>methods developed for heterogeneous populations.</p

    Nonparametric Bayes Estimation of Contamination Levels using Observations from the Residual Distribution

    Get PDF
    A nonparametric Bayes estimator of the survival function is derived for right censored data where additional observations from the residual distribution are available. The estimation is motivated by data on contamination concentrations for chromium from one of the EPA\u27s toxic waste sites. The residual sample can be produced by hot spot sampling, where only samples above a given threshold value are collected. The Dirichlet process is used to formulate prior information about the chromium contamination, and we compare the Bayes estimator of the mean concentration level to other estimators currently considered by the EPA and other sources. The Bayes estimator generally out- performs the other estimators under various cost functions. The limiting distribution is the nonparametric maximum likelihood estimator, which is identical to the Kaplan-Meier estimator for concentration values observed below the residual sample threshold. Robustness of the Bayes estimate is examined with respect to misspecification of the prior and its sensitivity to the censoring distribution

    Spatial Cluster Detection for Weighted Outcomes Using Cumulative Geographic Residuals

    Get PDF
    Spatial cluster detection is an important methodology for identifying regions with excessive numbers of adverse health events without making strong model assumptions on the underlying spatial dependence structure. Previous work has focused on point or individual-level outcome data and few advances have been made when the outcome data are reported at an aggregated level, e.g. at the county- or census tract-level. This paper proposes a new class of spatial cluster detecion methods for point or aggregate data, comprising of continuous, binary, and count data. Compared with the existing spatial cluster detection methods it has the following advantages. First, it readily incorporates region-specific weights, for example, based on a region’s population or a region’s outcome variance, which is key for aggregate data. Second, the established general framework allows for area-level and individual-level covariate adjustment. A simulation study is conducted to evaluate the performance of the method. The proposed method is then applied to assess spatial clustering of high Body Mass Index in a HMO population in the Seattle, Washington USA area
    • …
    corecore