190 research outputs found
What's the big idea? Data linkage
Katie Harron explains the statistical concepts that underpin the complex but crucial process of linking records stored in different databases
Which people are most affected by changes to data linkage methodology? An exploration of patient, organisational and spatiotemporal characteristics in administrative hospital data in England
ObjectivesIn 2021, NHS Digital changed the process used to link records belonging to the same person across and within data collections. Our objectives were to identify patient, organisational and spatiotemporal characteristics associated with records impacted by this change and the implications for researchers using this data.
MethodsWe used an observational cohort study of patients, aged 55 or less, with a secondary care contact recorded in any of the NHS Digital (now part of NHS England) curated Hospital Episode Statistics (HES) datasets between April 1997 and March 2021. We compared clusters of records assigned to each patient using the HES ID (old methodology using a three-step deterministic algorithm) and the Person ID (new methodology using a master patient spine). We used multivariable logistic regression to identify patient, organisational and spatiotemporal (such as area-level deprivation and year of first contact) characteristics associated with patients whose cluster had changed.
ResultsOf 88 million hospital records in 2019, there were 18,968,711 distinct HES IDs and 18,717,142 distinct TPIs. Of the 12,701,169 HES IDs with more than one record, 145,948 (1.1%) were split into multiple Person IDs. Of the 12,999,671 Person IDs with more than one record, 483,091 (3.7%) were associated with two or more merged HES IDs. We will present an analysis using data covering the period April 1997 to March 2021 - 1.25 billion records - and present the characteristics associated with changes between linkage methods.
ConclusionOur findings indicate that this change consolidated clusters, resulting in fewer distinct individuals in the data. Our findings will inform researchers about which groups of individuals are most likely to be affected by changes to linkage methodology. This is vital for understanding potential sources of bias due to linkage error
Which children in England see the health visiting team and how often?
In June 2021, Public Health England published its first set of statistics describing additional health visiting contacts. What do these statistics tell us about patterns of health visiting contacts at a national level
How is ethnicity reported, described, and analysed in health research in the UK? A bibliographical review and focus group discussions with young refugees
Background: The ethnicity data gap pertains to 3 major challenges to address ethnic health inequality: 1) Under-representation of ethnic minorities in research; 2) Poor data quality on ethnicity; 3) Ethnicity data not being meaningfully analysed. These challenges are especially relevant for research involving under-served migrant populations in the UK. We aimed to review how ethnicity is captured, reported, analysed and theorised within policy-relevant research on ethnic health inequities. Methods: We reviewed a selection of the 1% most highly cited population health papers that reported UK data on ethnicity, and extracted how ethnicity was recorded and analysed in relation to health outcomes. We focused on how ethnicity was obtained (i.e. self reported or not), how ethnic groups were categorised, whether justification was provided for any categorisation, and how ethnicity was theorised to be related to health. We held three 1-h-long guided focus groups with 10 young people from Nigeria, Turkistan, Syria, Yemen and Iran. This engagement helped us shape and interpret our findings, and reflect on. 1) How should ethnicity be asked inclusively, and better recorded? 2) Does self-defined ethnicity change over time or context? If so, why? Results: Of the 44 included papers, most (19; 43%) used self-reported ethnicity, categorised in a variety of ways. Of the 27 papers that aggregated ethnicity, 13 (48%) provided justification. Only 8 of 33 papers explicitly theorised how ethnicity related to health. The focus groups agreed that 1) Ethnicity should not be prescribed by others; individuals could be asked to describe their ethnicity in free-text which researchers could synthesise to extract relevant dimensions of ethnicity for their research; 2) Ethnicity changes over time and context according to personal experience, social pressure, and nationality change; 3) Migrants and non-migrants’ lived experience of ethnicity is not fully inter-changeable, even if they share the same ethnic category. Conclusions: Ethnicity is a multi-dimensional construct, but this is not currently reflected in UK health research studies, where ethnicity is often aggregated and analysed without justification. Researchers should communicate clearly how ethnicity is operationalised for their study, with appropriate justification for clustering and analysis that is meaningfully theorised. We can only start to tackle ethnic health inequity by treating ethnicity as rigorously as any other variables in our research
The Impact of Uncertainty Shocks under Measurement Error: A Proxy SVAR approach
A growing literature considers the impact of uncertainty using SVAR models that include proxies for uncertainty shocks as endogenous variables. In this paper we consider the impact of measurement error in these proxies on the estimated impulse responses. We show via a Monte-Carlo experiment that measurement error can result in attenuation bias in impulse responses. In contrast, the proxy SVAR that uses the
uncertainty shock proxy as an instrument does not su¤er from this bias. Applying this latter method to the Bloom (2009) data-set results in impulse responses to uncertainty shocks that are larger in magnitude and more persistent than those obtained from a
recursive SVAR
Making co-enrolment feasible for randomised controlled trials in paediatric intensive care.
Enrolling children into several trials could increase recruitment and lead to quicker delivery of optimal care in paediatric intensive care units (PICU). We evaluated decisions taken by clinicians and parents in PICU on co-enrolment for two large pragmatic trials: the CATCH trial (CATheters in CHildren) comparing impregnated with standard central venous catheters (CVCs) for reducing bloodstream infection in PICU and the CHIP trial comparing tight versus standard control of hyperglycaemia
Demystifying probabilistic linkage: Common myths and misconceptions
Many of the distinctions made between probabilistic and deterministic linkage are misleading. While these two approaches to record linkage operate in different ways and can produce different outputs, the distinctions between them are more a result of how they are implemented than because of any intrinsic differences. In the way they are generally applied, probabilistic and deterministic procedures can be little more than alternative means to similar ends—or they can arrive at very different ends depending on choices that are made during implementation. Misconceptions about probabilistic linkage contribute to reluctance for implementing it and mistrust of its outputs. By examining some common misconceptions about probabilistic linkage and its difference from deterministic linkage, we highlight the potential impact of design choices on the outputs of either approach. We hope that better understanding of linkage designs will help to allay some concerns about probabilistic linkage, and will help data linkers to tailor either procedure to produce outputs that are appropriate for their intended use
Probabilistic linkage without personal information successfully linked national clinical datasets: Linkage of national clinical datasets without patient identifiers using probabilistic methods.
BACKGROUND: Probabilistic linkage can link patients from different clinical databases without the need for personal information. If accurate linkage can be achieved, it would accelerate the use of linked datasets to address important clinical and public health questions. OBJECTIVE: We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without personal information, and validated it against deterministic linkage using patient identifiers. STUDY DESIGN AND SETTING: We used electronic health records from the National Bowel Cancer Audit (NBOCA) and Hospital Episode Statistics (HES) databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service. RESULTS: Probabilistic linkage linked 81.4% of NBOCA records to HES, versus 82.8% using deterministic linkage. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach. CONCLUSION: Probabilistic linkage was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. It allows analysts outside highly secure data environments to undertake linkage while minimising costs and delays, protecting data security, and maintaining linkage quality
‘What about the dads?’ Linking fathers and children in administrative data: A systematic scoping review
Research has shown that paternal involvement positively impacts on child health and development. We aimed to develop a conceptual model of dimensions of fatherhood, identify and categorise methods used for linking fathers with their children in administrative data, and map these methods onto the dimensions of fatherhood. We carried out a systematic scoping review to create a conceptual framework of paternal involvement and identify studies exploring the impact of paternal exposures on child health and development outcomes using administrative data. We identified four methods that have been used globally to link fathers and children in administrative data based on family or household identifiers using address data, identifiable information about the father on the child's birth registration, health claims data, and Personal Identification Numbers. We did not identify direct measures of paternal involvement but mapping linkage methods to the framework highlighted possible proxies. The addition of paternal National Health Service numbers to birth notifications presents a way forward in the advancement of fatherhood research using administrative data sources
- …