24 research outputs found
Simulation Approach to Assess the Precision of Estimates Derived from Linking Survey and Administrative Records
Probabilistic record linkage implies that there is some level of uncertainty related to the classification of pairs as links or non-links vis-à-vis their true match status. As record linkage is usually performed as a preliminary step to developing statistical estimates, the question then is how does this linkage uncertainty propagate to them? In this paper, we develop an approach to estimate the impact of linkage uncertainty on derived estimates by using a re-sampling approach. For each iteration of the re-sampling, pairs are classified as links or non-links by Monte-Carlo assignment to model estimated true match probabilities. By looking at the range of estimates produced in a series of re-samples, we can estimate the distribution of derived statistics under the prevailing incidence of linkage uncertainty. For this analysis we use the results of linking the 2014 National Hospital Care Survey to the National Death Index performed at the National Center for Health Statistics. We assess the precision of hospital-level death rate estimates
Nonsampling errors and their implication for estimates of current cancer treatment using the Medical Expenditure Panel Survey
Survey nonsampling errors refer to the components of total survey error (TSE) that result from failures in data collection and processing procedures. Evaluating nonsampling errors can lead to a better understanding of their sources, which in turn, can inform survey inference and assist in the design of future surveys. Data collected via supplemental questionnaires can provide a means for evaluating nonsampling errors because it may provide additional information on survey nonrespondents and/or measurements of the same concept over repeated trials on the same sampling unit. We used a supplemental questionnaire administered to cancer survivors to explore potential nonsampling errors, focusing primarily on nonresponse and measurement/specification errors. We discuss the implications of our findings in the context of the TSE paradigm and identify areas for future research
Housing and Health: Linking Population Health Survey Data to Housing Assistance Data
Introduction
The linkage of survey data with administrative data enhances the scientific value and analytic potential of both sources of information. Combining multiple data sources facilitates richer analyses and allows data users to answer research questions that cannot be addressed easily using a single data source.
Objectives and Approach
Recently, the United States National Center for Health Statistics (NCHS) and Department of Housing and Urban Development (HUD) collaborated to link two population health surveys conducted by NCHS with housing assistance program data maintained by HUD. The resulting linked data files enable researchers to examine relationships between the receipt of federal housing assistance and health. In this talk, we will describe some of the challenges faced when initiating a data sharing agreement between two federal agencies governed by distinct legislative authorities, particularly issues related to legal requirements and data access.
Results
We will describe each of the data sources used in the linkage as well as the methodology used to combine the data. Lastly, the discussion will focus on the inter-agency collaboration that led to the production of the supporting technical documentation developed to assist researchers using the linked data files. The linkage of NCHS survey data and HUD administrative data serves as an example of how two agencies were able to overcome challenges to successfully form a data sharing partnership as a cost-effective means to develop a robust data source that benefits the collaborating agencies as well as policy makers and outside researchers.
Conclusion/Implications
Both agencies anticipate that this partnership will continue as additional survey and administrative data are collected
Quality of linked data: Linking the National Hospital Care Survey Data to the National Death Index
Introduction
Data linkages can produce rich data resources to address a variety of research topics. However, assessing linkage quality can be challenging given that there are many approaches and no clear best practices.
Objectives and Approach
Through its Data Linkage Program, the National Center for Health Statistics (NCHS) links national survey data with vital and administrative records. A recent linkage of the National Hospital Care Survey data with the National Death Index employed a new linkage methodology, which included a first time approach for validating the results within the linkage algorithm.
Results
The new methodology includes two passes: a deterministic linkage, followed by a probabilistic approach based on the Fellegi-Sunter methodology. In the second pass, a key identifier, Social Security Number (SSN), was not used as a linkage variable but instead to determine link accuracy, when available on the patient record. A model was then built to predict link accuracy status according to the computed Fellegi-Sunter total pair weight and then used to estimate it for those patient records without an SSN. Results from this new approach were compared with results from prior linkage methodologies and generated higher match rates and lower error rates. The linkage methodology designed for this study is now being tested on other types of input data such as data from household surveys.
Conclusion/Implications
The linkage approach may be incorporated into additional linkages conducted by NCHS. This talk will describe the input sources for this linkage, the methodology used, the error rate assessment and then discuss conclusions and implications for precision and efficiency
Blood Lead Levels and Death from All Causes, Cardiovascular Disease, and Cancer: Results from the NHANES III Mortality Study
BACKGROUND: Analyses of mortality data for participants examined in 1976–1980 in the second National Health and Nutrition Examination Survey (NHANES II) suggested an increased risk of mortality at blood lead levels > 20 μg/dL. Blood lead levels have decreased markedly since the late 1970s. In NHANES III, conducted during 1988–1994, few adults had levels > 20 μg/dL. OBJECTIVE: Our objective in this study was to determine the risk of mortality in relation to lower blood lead levels observed for adult participants of NHANES III. METHODS: We analyzed mortality information for 9,757 participants who had a blood lead measurement and who were ≥ 40 years of age at the baseline examination. Using blood lead levels categorized as < 5, 5 to < 10, and ≥ 10 μg/dL, we determined the relative risk of mortality from all causes, cancer, and cardiovascular disease through Cox proportional hazard regression analysis. RESULTS: Using blood lead levels < 5 μg/dL as the referent, we determined that the relative risk of mortality from all causes was 1.24 [95% confidence interval (CI), 1.05–1.48] for those with blood levels of 5–9 μg/dL and 1.59 (95% CI, 1.28–1.98) for those with blood levels ≥ 10 μg/dL (p for trend < 0.001). The magnitude of risk was similar for deaths due to cardiovascular disease and cancer, and tests for trend were statistically significant (p < 0.01) for both causes of death. CONCLUSION: In a nationally representative sample of the U.S. population, blood lead levels as low as 5–9 μg/dL were associated with an increased risk of death from all causes, cardiovascular disease, and cancer
Recommended from our members
Cancer Informatics for Cancer Centers (CI4CC): Building a Community Focused on Sharing Ideas and Best Practices to Improve Cancer Care and Patient Outcomes.
Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. Although each of the participating cancer centers is structured differently, and leaders' titles vary, we know firsthand there are similarities in both the issues we face and the solutions we achieve. As a consortium, we have initiated a dedicated listserv, an open-initiatives program, and targeted biannual face-to-face meetings. These meetings are a place to review our priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues we, as informatics leaders, individually face at our respective institutions and cancer centers. Here we provide a brief history of the CI4CC organization and meeting highlights from the latest CI4CC meeting that took place in Napa, California from October 14-16, 2019. The focus of this meeting was "intersections between informatics, data science, and population science." We conclude with a discussion on "hot topics" on the horizon for cancer informatics
Recommended from our members
Detectable Clonal Mosaicism from Birth to Old Age and its Relationship to Cancer
Clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) was detected using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells (>5–10%) with the same abnormal karyotype (presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rises rapidly to 2–3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions that pinpoint the locations of genes previously associated with hematological cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer prior to DNA sampling, those without a prior diagnosis have an estimated 10-fold higher risk of a subsequent hematological cancer (95% confidence interval = 6–18)
Recommended from our members
Genome-wide association study of Tourette Syndrome
Tourette Syndrome (TS) is a developmental disorder that has one of the highest familial recurrence rates among neuropsychiatric diseases with complex inheritance. However, the identification of definitive TS susceptibility genes remains elusive. Here, we report the first genome-wide association study (GWAS) of TS in 1285 cases and 4964 ancestry-matched controls of European ancestry, including two European-derived population isolates, Ashkenazi Jews from North America and Israel, and French Canadians from Quebec, Canada. In a primary meta-analysis of GWAS data from these European ancestry samples, no markers achieved a genome-wide threshold of significance (p<5 × 10−8); the top signal was found in rs7868992 on chromosome 9q32 within COL27A1 (p=1.85 × 10−6). A secondary analysis including an additional 211 cases and 285 controls from two closely-related Latin-American population isolates from the Central Valley of Costa Rica and Antioquia, Colombia also identified rs7868992 as the top signal (p=3.6 × 10−7 for the combined sample of 1496 cases and 5249 controls following imputation with 1000 Genomes data). This study lays the groundwork for the eventual identification of common TS susceptibility variants in larger cohorts and helps to provide a more complete understanding of the full genetic architecture of this disorder
Violence and Substance Use among an Injured Emergency Department Population
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/74228/1/aemj.10.7.764.pd