Search CORE

51 research outputs found

Quality and complexity measures for data linkage and deduplication

Author: C Shearer
Centre for Epidemiology and Research NSW Department of Health
CW Kelman
D Pyle
DP Bertsekas
DS Zingmond
E Rahm
HB Newcombe
I Fellegi
L Gill
MA Hernandez
ME Smith
RA Baeza-Yates
S Gomatam
S Salzberg
T Blakely
TROC Fawcett
WS Cooper
Publication venue: Springer
Publication date: 01/01/2007
Field of study

Summary. Deduplicating one data set or linking several data sets are increasingly important tasks in the data preparation steps of many data mining projects. The aim of such linkages is to match all records relating to the same entity. Research interest in this area has increased in recent years, with techniques originating from statistics, machine learning, information retrieval, and database research being combined and applied to improve the linkage quality, as well as to increase performance and efficiency when linking or deduplicating very large data sets. Different measures have been used to characterise the quality and complexity of data linkage algorithms, and several new metrics have been proposed. An overview of the issues involved in measuring data linkage and deduplication quality and complexity is presented in this chapter. It is shown that measures in the space of record pair comparisons can produce deceptive quality results. Various measures are discussed and recommendations are given on how to assess data linkage and deduplication quality and complexity. Key words: data or record linkage, data integration and matching, deduplication, data mining pre-processing, quality and complexity measures

CiteSeerX

Crossref

Sociodemographic differences in linkage error: An examination of four large-scale datasets

Author: A Ferrante
Adrian Brown
Anna Ferrante
Australian Bureau of Statistics
C Ringland
Christian Borgs
CL Moore
D Nitsch
D Trewin
DK Waller
DL Rosman
DS Zingmond
EA Miller
EH Lawson
EL Brook
ES Hall
G Hagger-Johnson
HB Newcombe
J Holian
James Boyd
JB Ford
JH Boyd
JT Lariscy
JW Parrish
K Harron
M Bopp
M Chiu
MA Bohensky
ME Gyllstrom
Rainer Schnell
S Sunderam
Sean Randall
SM Randall
T Blakely
TR Cote
VM Shkolnikov
W Oberaigner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© 2018 The Author(s). Background: Record linkage is an important tool for epidemiologists and health planners. Record linkage studies will generally contain some level of residual record linkage error, where individual records are either incorrectly marked as belonging to the same individual, or incorrectly marked as belonging to separate individuals. A key question is whether errors in linkage quality are distributed evenly throughout the population, or whether certain subgroups will exhibit higher rates of error. Previous investigations of this issue have typically compared linked and un-linked records, which can conflate bias caused by record linkage error, with bias caused by missing records (data capture errors). Methods: Four large administrative datasets were individually de-duplicated, with results compared to an available 'gold-standard' benchmark, allowing us to avoid methodological issues with comparing linked and un-linked records. Results were compared by gender, age, geographic remoteness (major cities, regional or remote) and socioeconomic status. Results: Results varied between datasets, and by sociodemographic characteristic. The most consistent findings were worse linkage quality for younger individuals (seen in all four datasets) and worse linkage quality for those living in remote areas (seen in three of four datasets). The linkage quality within sociodemographic categories varied between datasets, with the associations with linkage error reversed across different datasets due to quirks of the specific data collection mechanisms and data sharing practices. Conclusions: These results suggest caution should be taken both when linking younger individuals and those in remote areas, and when analysing linked data from these subgroups. Further research is required to determine the ramifications of worse linkage quality in these subpopulations on research outcomes

Crossref

Directory of Open Access Journals

espace@Curtin

Updated fracture incidence rates for the US version of FRAX®

Author: A Cranney
A. R. Pressman
AN Tosteson
B Dawson-Hughes
B Dawson-Hughes
B. Dawson-Hughes
B. Ettinger
C Laet De
C MacLean
CJ Rosen
D. M. Black
DS Zingmond
ES Siris
JA Kanis
JA Kanis
JA Kanis
JW Nieves
KL Stone
L. J. Melton
LJ Melton III
LJ Melton III
LJ Melton III
LJ Melton III
LJ Melton III
MG Donaldson
MJ Bolland
NOF
PD Delmas
PN Biswas
R Burge
R Hiebert
R Rizzoli
S Khosla
SC Schuit
SH Gehlbach
SJ Gallacher
SR Cummings
USDHHS
USDHHS
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

# The Author(s) 2009. This article is published with open access at Springerlink.com Summary On the basis of updated fracture and mortality data, we recommend that the base population values used in the US version of FRAX ® be revised. The impact of suggested changes is likely to be a lowering of 10-year fracture probabilities. Introduction Evaluation of results produced by the US version of FRAX ® indicates that this tool overestimates the likelihood of major osteoporotic fracture. In an attempt to correct this, we updated underlying fracture and mortality rates for the model. Methods We used US hospital discharge data from 2006 t

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Data Linkage: A powerful research tool with potential problems

Author: A Kariminia
AF Klassen
AF Young
AF Young
AJ Dalrymple
AR Tate
B Herrchen
C Ringland
Canadian Institute for Health Information
Caroline A Brand
CDA Holman
CW Kelman
D Magliano
D Nitsch
Damien Jolley
David V Pilcher
DK Waller
DL Rosman
DP Silveira
DS Zingmond
E von Elm
ED Acheson
HT Sorensen
Ian Scott
Institute of Medicine Committee on Quality of Health in America
J Holian
J Holian
JB Ford
JL Hoving
KM Dunn
KR Grace
LS Jebamani
M Bopp
M Tromp
M Winglee
M Zhang
MA Mohammed
ME Gyllstrom
Megan A Bohensky
MM Adams
MM Adams
N Black
N Black
N Huang
NA Maizlish
National Community Services Information Management Group
National Health and Hospitals Reform Commission (Australia). Australia. Dept. of Health and Ageing
NP Roos
PA Buescher
PC Cryer
R Baker
R Chamberlayne
S Kendrick
S Liu
S Sunderam
SM Evans
Sue Evans
T Blakely
T Blakely
T Harris
T Williams
The Bristol Royal Infirmary Inquiry
TR Cote
V Sundararajan
Vijaya Sundararajan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Policy makers, clinicians and researchers are demonstrating increasing interest in using data linked from multiple sources to support measurement of clinical performance and patient health outcomes. However, the utility of data linkage may be compromised by sub-optimal or incomplete linkage, leading to systematic bias. In this study, we synthesize the evidence identifying participant or population characteristics that can influence the validity and completeness of data linkage and may be associated with systematic bias in reported outcomes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

University of Queensland eSpace

Automation of a problem list using natural language processing

Author: AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
CA Knirsch
CA Sneiderman
CD Manning
D Zingmond
DL Ranum
E Bayegan
E Chi
G Hripcsak
G Hripcsak
G Paterson
G Shadow
GF Cooper
H Bludau
H Goldberg
H Goldberg
H Wasserman
H Xu
HJ Scherpbier
Institute of Medicine (U.S.)
International Organization for Standardization
J Nivre
J Starmer
J Zelingher
JC Reichert
JEF Friedl
JR Campbell
JR Campbell
JS Elkins
JW Hales
K Heitmann
K Thompson
L Christensen
LL Weed
LL Weed
LT Kohn
LW Wright
M Fiszman
M Fiszman
M Fiszman
M Weeber
ML Muller
MS Donaldson
MS Tuttle
N Sager
NL Jain
P Haug
P Nadkerni
P Spyns
Peter J Haug
PF Brennan
PG Mutalik
PJ Haug
PJ Haug
PJ Haug
PL Elkin
Q Zou
RH Dolin
S Meystre
SB Koehler
SC Kleene
SJ Wang
SM Huff
Stephane Meystre
T Payne
TC Rindflesch
TC Rindflesch
W Pratt
W Pratt
WW Chapman
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Using Both Clinical Registry and Administrative Claims Data to Measure Risk-adjusted Surgical Outcomes.

Author: Brook Robert H
Hall Bruce Lee
Ko Clifford Y
Lawson Elise H
Louie Rachel
Sacks Greg D
Zingmond David S
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

ObjectiveTo examine the validity of hybrid quality measures that use both clinical registry and administrative claims data, capitalizing on the strengths of each data source.BackgroundPrevious studies demonstrate substantial disagreement between clinical registry and administrative claims data on the occurrence of postoperative complications. Clinical data have greater validity than claims data for quality measurement but can be burdensome for hospitals to collect.MethodsAmerican College of Surgeons National Surgical Quality Improvement Program records were linked to Medicare inpatient claims (2005-2008). National Quality Forum-endorsed risk-adjusted measures of 30-day postoperative complications or death assessed hospital quality for patients undergoing colectomy, lower extremity bypass, or all surgical procedures. Measures use hierarchical multivariable logistic regression to identify statistical outliers. Measures were applied using clinical data, claims data, or a hybrid of both data sources. Kappa statistics assessed agreement on determinations of hospital quality.ResultsA total of 111,984 patients participated from 206 hospitals. Agreement on hospital quality between clinical and claims data was poor. Hybrid models using claims data to risk-adjust complications identified by clinical data had moderate agreement with all clinical data models, whereas hybrid models using clinical data to risk-adjust complications identified by claims data had routinely poor agreement with all clinical data models.ConclusionsAssessments of hospital quality differ substantially when using clinical registry versus administrative claims data. A hybrid approach using claims data for risk adjustment and clinical data for complications may be a valid alternative with lower data collection burden. For quality measures focused on postoperative complications to be meaningful, such policies should require, at a minimum, collection of clinical outcomes data

eScholarship - University of California

Recommended from our members

Relationship Between Hospital Performance on a Patient Satisfaction Survey and Surgical Quality.

Author: Dawes Aaron J
Ko Clifford Y
Lawson Elise H
Maggard-Gibbons Melinda
Russell Marcia M
Sacks Greg D
Zingmond David S
Publication venue: eScholarship, University of California
Publication date: 01/09/2015
Field of study

ImportanceThe Centers for Medicare and Medicaid Services include patient experience as a core component of its Value-Based Purchasing program, which ties financial incentives to hospital performance on a range of quality measures. However, it remains unclear whether patient satisfaction is an accurate marker of high-quality surgical care.ObjectiveTo determine whether hospital performance on a patient satisfaction survey is associated with objective measures of surgical quality.Design, setting, and participantsRetrospective observational study of participating American College of Surgeons National Surgical Quality Improvement Project (ACS NSQIP) hospitals. We used data from a linked database of Medicare inpatient claims, ACS NSQIP, the American Hospital Association annual survey, and Hospital Compare from December 2, 2004, through December 31, 2008. A total of 103 866 patients older than 65 years undergoing inpatient surgery were included. Hospitals were grouped by quartile based on their performance on the Hospital Consumer Assessment of Healthcare Providers and Systems survey. Controlling for preoperative risk factors, we created hierarchical logistic regression models to predict the occurrence of adverse postoperative outcomes based on a hospital's patient satisfaction scores.Main outcomes and measuresThirty-day postoperative mortality, major and minor complications, failure to rescue, and hospital readmission.ResultsOf the 180 hospitals, the overall mean patient satisfaction score was 68.0% (first quartile mean, 58.7%; fourth quartile mean, 76.7%). Compared with patients treated at hospitals in the lowest quartile, those at the highest quartile had significantly lower risk-adjusted odds of death (odds ratio = 0.85; 95% CI, 0.73-0.99), failure to rescue (odds ratio = 0.82; 95% CI, 0.70-0.96), and minor complication (odds ratio = 0.87; 95% CI, 0.75-0.99). This translated to relative risk reductions of 11.1% (P = .04), 12.6% (P = .02), and 11.5% (P = .04), respectively. No significant relationship was noted between patient satisfaction and either major complication or hospital readmission.Conclusions and relevanceUsing a national sample of hospitals, we demonstrated a significant association between patient satisfaction scores and several objective measures of surgical quality. Our findings suggest that payment policies that incentivize better patient experience do not require hospitals to sacrifice performance on other quality measures

eScholarship - University of California