166 research outputs found

    Survey data integration using mass imputation

    Get PDF
    Survey sampling has been considered a scientific method of collecting data that represent the target population. Statistical inference using survey data can be improved by incorporating information from existing external data sources. The auxiliary information from other sources can be incorporated into either the design or the estimation stage. In some cases, the original survey data can be augmented with extra data. The data integration can be viewed as a missing data problem and a mass imputation approach can be used for data integration. By filling in the missing values for the study variable in one sample with imputed values incorporating information from the other sample, we can obtain an improved estimator integrating information from two samples. This dissertation addresses the development of procedures that incorporate auxiliary information or data for three different situations. Three corresponding papers constitute the dissertation and each paper deals with some aspect of incorporation of auxiliary information with survey data that enables us to gain efficiency in inference. The first paper considers the propensity score weighting method that incorporates auxiliary information from paradata. Paradata are automatically obtainable data about a survey process, which are generated as by-product, and they can be used to handle nonresponse biases. Conditions that are necessary to obtain efficiency gain by incorporating auxiliary information from paradata into the propensity score are considered. The second paper introduces a new approach to combine two independent probability samples that are selected from the same target population. Augmenting two surveys increases the amount of information about the quantities of our interest and enhances precision in estimation. We introduce the survey data integration method using the measurement error model approach. The third paper deals with the integration of a two-phase sample where the two samples can be nested or non-nested. We first present the two-phase sampling using the mass imputation method, which can provide an efficient method to combine two samples where one is nested within the other. A special case of non-nested two-phase sampling where the second-phase sample is a non-probability sample is also investigated

    Pseudo-clustering for combining data sets with multiple hierarchies

    Full text link
    Multi-level modeling is an important approach for analyzing complex survey data using multi-stage sampling. However, estimation of multi-level models can be challenging when we combine several datasets with distinct hierarchies with sampling weights. This paper presents a method for combining multiple datasets with different hierarchical structures due to distinct informative sampling designs for the same survey. To develop an approach with complete generality, we propose to define a pseudo-cluster, a cluster containing only a singleton observation, to unify the data structure and thereby enable estimation of multi-level models incorporating sampling weights across the combined sample. We justify incorporating sampling weights at each level of the hierarchical model and in doing-so define a pseudo-likelihood estimation procedure. Simulation studies are used to illustrate the effect of incorporating sampling designs in this challenging multi-level modeling scenario. We demonstrate in the simulation studies that considering a linear mixed model with sampling weights provides unbiased estimates of model parameters and enhances the estimation of the variance components of the random effects. The proposed method is illustrated through a novel application from the National Survey of Healthcare Organizations and Systems that sought to determine which organizational characteristics or traits, as measured in the surveys, have the strongest average relationship to the percentage of depression and anxiety diagnoses in physician practices in the US.Comment: 37 pages, 6 tables, 1 figur

    Clinicopathological features of infiltrating lobular carcinomas comparing with infiltrating ductal carcinomas: a case control study

    Get PDF
    BACKGROUND: Infiltrating lobular carcinoma (ILC) is the second most common type of invasive breast cancers and it has been reported to have some unique biologic and epidemiologic characteristics. METHODS: Clinicopathological features of 95 patients with ILC, their relapse free survival (RFS) and overall survival (OS) were retrospectively investigated and compared with those of 3,621 patients with infiltrating ductal carcinoma-not otherwise specified (IDC-NOS) between January 1984 and December 2005. RESULTS: ILC constitutes 2.3% of all invasive breast cancers. There were no difference between the ILC and the IDC-NOS groups regarding age at diagnosis, tumor size, nodal status, and treatment modalities except hormone therapy. The ILC group showed more estrogen receptor expression, less HER-2 expression and higher bilaterality. RFS and OS of the ILC patients were similar to those of the IDC. IDC-NOS metastasized more frequently to the lung and bone, whereas, ILC to the bone and ovary. CONCLUSIONS: The incidence of ILC was relatively low in Korean breast cancer patients. Comparing to IDC-NOS ILC showed some different features such as higher estrogen receptor expression, less HER-2 expression, higher bilaterality and preferred metastatic sites of bone and ovary. Contralateral cancers and bone and ovary evaluation should be considered when monitoring ILC patients

    Comparing weighting and imputation methods for enhancing statistical inference of health surveys given administrative claims data

    Full text link
    National surveys of the healthcare system in the United States were conducted to characterize the structure of healthcare system and investigate the impact of evidence-based innovations in healthcare systems on healthcare services. Administrative data is additionally available to researchers raising the question of whether inferences about healthcare organizations based on the survey data can be enhanced by incorporating information from auxiliary data. Administrative data can provide information for dealing with under-coverage-bias and non-response in surveys and for capturing more sub-populations. In this study, we focus on the use of administrative claims data to improve estimates about means of survey items for the finite population. Auxiliary information from the claims data is incorporated using multiple imputation to impute values of non-responding or non-surveyed organizations. We derive multiple versions of imputation strategy, and the logical development of methodology is compared to two incumbent approaches: a na\"ive analysis that ignores the sampling probabilities and a traditional survey analysis weighting by the inverses of the sampling probabilities. , and illustrate the methods using data from The National Survey of Healthcare Organizations and Systems and The Centers for Medicare & Medicaid Services Medicare claims data to make inferences about relationships of characteristics of healthcare organizations and healthcare services they provide.Comment: 16 pages, 9 tables, 2 figure

    Mass imputation for two-phase sampling

    Get PDF
    Two-phase sampling is a cost-effective method of data collection using outcomedependent sampling for the second-phase sample. In order to make efficient use of auxiliary information and to improve domain estimation, mass imputation can be used in two-phase sampling. Rao and Sitter (1995) introduce mass imputation for two-phase sampling and its variance estimation under simple random sampling in both phases. In this paper, we extend the Rao–Sitter method to general sampling design. The proposed method is further extended to mass imputation for categorical data. A limited simulation study is performed to examine the performance of the proposed methods

    Adjuvant chemotherapy and survival among patients 70 years of age and younger with node-negative breast cancer and the 21-gene recurrence score of 26-30

    Get PDF
    BACKGROUND: The benefits of chemotherapy in node-negative, hormone receptor-positive, and human epidermal growth factor receptor 2 (HER2)-negative breast cancer patients with the 21-gene recurrence score (RS) of 18-30, particularly those with RS 26-30, are not known. METHODS: Using the Surveillance, Epidemiology, and End Results (SEER) data, we retrospectively identified 29,137 breast cancer patients with the 21-gene RS of 18-30 diagnosed between 2004 and 2015. Mortality risks according to the RS and chemotherapy use were compared by the Kaplan-Meier method and Cox\u27s proportional hazards model. RESULTS: Among the breast cancer patients with the RS 18-30, 21% of them had RS 26-30. Compared to breast cancer patients with RS 18-25, patients with RS 26-30 had more aggressive tumor characteristics and chemotherapy use and increased risk of breast cancer-specific mortality and overall mortality. In breast cancer patients who were aged ≤ 70 years and had RS of 26-30, chemotherapy administration was associated with a 32% lower risk of breast cancer-specific mortality (hazard ratio [HR], 0.68; 95% confidence interval [CI], 0.47-0.99) and a 42% lower risk of overall mortality (HR, 0.58; 95% CI, 0.44-0.76). Survival benefits were most pronounced in breast cancer patients who were younger or had grade III tumor. CONCLUSIONS: The 21-gene RS of 18-30 showed heterogeneous outcomes, and the RS 26-30 was a significant prognostic factor for an increased risk of mortality. Adjuvant chemotherapy could improve the survival of node-negative, hormone receptor-positive, and HER2-negative breast cancer patients with the 21-gene RS 26-30 and should be considered for patients, especially younger patients or patients with high-grade tumors

    Clinicopathological Characteristics of Male Breast Cancer

    Get PDF
    PURPOSE: To investigate clinicopathological characteristics and outcomes of male breast cancer (MBC). PATIENTS AND METHODS: We retrospectively analyzed the data of 20 MBC patients in comparison with female ductal carcinoma treated at Yonsei University Severance Hospital from July 1985 to May 2007. Clinicopathological features, treatment patterns, and survival were investigated. RESULTS: MBC consists of 0.38% of all breast cancers. The median age was 56 years. The median symptom duration was 10 months. The median tumor size was 1.7 cm, 27.8% showed node metastasis, and 71.4% were estrogen receptor positive. All 20 cancers were arisen from ductal cells. No lobular carcinoma was found. The incidence of stages 0, I, II, and III in patients were 2, 10, 4, and 3, respectively. All patients underwent mastectomy. One with invasive cancer did not receive axillary node dissection and stage was not exactly evaluated. Adjuvant treatments were determined by pathologic parameters and stage. Clinicopathological parameters and survival rates of MBC were comparable to those of female ductal carcinoma. CONCLUSION: The onset age of MBC was 10 years older and symptom duration was longer than in female patients. No difference in outcomes between MBC and female ductal carcinoma suggests that the biology of MBC is not different from that of females. Therefore, education, an appropriate system for early detection, and adequate treatment are necessary for improving outcomes.ope
    corecore