Search CORE

23 research outputs found

Automatic de-identification of textual documents in the electronic health record: a review of recent research

Author: B Wellner
BA Beckwith
Brett R South
C Friedman
D Gupta
DA Dorr
E Aramaki
EM Fielstein
F Jeffrey Friedlin
FJ Friedlin
FP Morrison
G Szarvas
G Szarvas
GPO U.S
GPO U.S
H Cunningham
I Neamatullah
J Gardner
JJ Berman
K Atkinson
K Hara
L Sweeney
Matthew H Samore
NCI
NLM
NLM
NLM
O Uzuner
O Uzuner
O Uzuner
O Uzuner
P Ruch
RK Taira
Shuying Shen
SM Meystre
SM Thomas
SM Thomas
Stephane M Meystre
Y Guo
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis and design of randomised clinical trials involving competing risks endpoints

Author: A Latouche
A Latouche
A Latouche
AWM Lee
B Friedlin
BC Tai
BC Tai
BC Tai
BC Tai
Bee-Choo Tai
BR Logan
D Machin
D Machin
DA Schoenfeld
David Machin
G Schulgen
HT Kim
J Beyersmann
J Beyersmann
J Wee
JD Kalbfleisch
JJ Dignam
JJ Gaynor
Joseph Wee
JP Fine
M Pintilie
PA Poole-Wilson
PR Williamson
R Bajorunaite
R Gray
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In randomised clinical trials involving time-to-event outcomes, the failures concerned may be events of an entirely different nature and as such define a classical competing risks framework. In designing and analysing clinical trials involving such endpoints, it is important to account for the competing events, and evaluate how each contributes to the overall failure. An appropriate choice of statistical model is important for adequate determination of sample size. Methods We describe how competing events may be summarised in such trials using cumulative incidence functions and Gray's test. The statistical modelling of competing events using proportional cause-specific and subdistribution hazard functions, and the corresponding procedures for sample size estimation are outlined. These are illustrated using data from a randomised clinical trial (SQNP01) of patients with advanced (non-metastatic) nasopharyngeal cancer. Results In this trial, treatment has no effect on the competing event of loco-regional recurrence. Thus the effects of treatment on the hazard of distant metastasis were similar via both the cause-specific (unadjusted <it>csHR </it>= 0.43, 95% CI 0.25 - 0.72) and subdistribution (unadjusted <it>subHR </it>0.43; 95% CI 0.25 - 0.76) hazard analyses, in favour of concurrent chemo-radiotherapy followed by adjuvant chemotherapy. Adjusting for nodal status and tumour size did not alter the results. The results of the logrank test (<it>p </it>= 0.002) comparing the cause-specific hazards and the Gray's test (<it>p </it>= 0.003) comparing the cumulative incidences also led to the same conclusion. However, the subdistribution hazard analysis requires many more subjects than the cause-specific hazard analysis to detect the same magnitude of effect. Conclusions The cause-specific hazard analysis is appropriate for analysing competing risks outcomes when treatment has no effect on the cause-specific hazard of the competing event. It requires fewer subjects than the subdistribution hazard analysis for a similar effect size. However, if the main and competing events are influenced in opposing directions by an intervention, a subdistribution hazard analysis may be warranted.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Leicester Research Archive