Search CORE

4 research outputs found

Development and evaluation of an open source software tool for deidentification of pathology reports

Author: AH Namini
Bruce A Beckwith
D Gupta
Frank Kuo
JJ Berman
L Sweeney
L Sweeney
L Sweeney
R Miller
Rajeshwarri Mahaadevan
RK Taira
SM Thomas
Ulysses J Balis
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Electronic medical records, including pathology reports, are often used for research purposes. Currently, there are few programs freely available to remove identifiers while leaving the remainder of the pathology report text intact. Our goal was to produce an open source, Health Insurance Portability and Accountability Act (HIPAA) compliant, deidentification tool tailored for pathology reports. We designed a three-step process for removing potential identifiers. The first step is to look for identifiers known to be associated with the patient, such as name, medical record number, pathology accession number, etc. Next, a series of pattern matches look for predictable patterns likely to represent identifying data; such as dates, accession numbers and addresses as well as patient, institution and physician names. Finally, individual words are compared with a database of proper names and geographic locations. Pathology reports from three institutions were used to design and test the algorithms. The software was improved iteratively on training sets until it exhibited good performance. 1800 new pathology reports were then processed. Each report was reviewed manually before and after deidentification to catalog all identifiers and note those that were not removed. RESULTS: 1254 (69.7 %) of 1800 pathology reports contained identifiers in the body of the report. 3439 (98.3%) of 3499 unique identifiers in the test set were removed. Only 19 HIPAA-specified identifiers (mainly consult accession numbers and misspelled names) were missed. Of 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contain numerous identifiers and were the most challenging to deidentify comprehensively. There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with our tool. CONCLUSION: We have demonstrated that it is possible to create an open-source deidentification program which performs well on free-text pathology reports

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Availability and quality of paraffin blocks identified in pathology archives: A multi-institutional study by the Shared Pathology Informatics Network (SPIN)

Author: A Abati
AA Patel
AA Patel
AG Glass
AH Namini
AM Holzbach
Anil V Parwani
Ashokkumar A Patel
BA Beckwith
CJ McDonald
D Gupta
David Seligson
DC Wertz
Dilipkumar Gupta
DP Lubeck
E Marshall
Eyas M Hattab
G Schadow
Hong Yu
Isaac S Kohane
J Melamed
John R Gilbertson
JR Gilbertson
Jules J Berman
KJ Mitchell
KJ Mitchell
Michael J Becich
MJ Becich
MJ Becich
MR Cooperberg
Osvaldo Schirripa
PA Fetsch
Sarah Dry
SJ Qualman
T Mizuno
Thomas M Ulbright
Ulysses J Balis
W Grizzle
WE Grizzle
WE Grizzle
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Shared Pathology Informatics Network (SPIN) is a tissue resource initiative that utilizes clinical reports of the vast amount of paraffin-embedded tissues routinely stored by medical centers. SPIN has an informatics component (sending tissue-related queries to multiple institutions via the internet) and a service component (providing histopathologically annotated tissue specimens for medical research). This paper examines if tissue blocks, identified by localized computer searches at participating institutions, can be retrieved in adequate quantity and quality to support medical researchers. METHODS: Four centers evaluated pathology reports (1990–2005) for common and rare tumors to determine the percentage of cases where suitable tissue blocks with tumor were available. Each site generated a list of 100 common tumor cases (25 cases each of breast adenocarcinoma, colonic adenocarcinoma, lung squamous carcinoma, and prostate adenocarcinoma) and 100 rare tumor cases (25 cases each of adrenal cortical carcinoma, gastro-intestinal stromal tumor [GIST], adenoid cystic carcinoma, and mycosis fungoides) using a combination of Tumor Registry, laboratory information system (LIS) and/or SPIN-related tools. Pathologists identified the slides/blocks with tumor and noted first 3 slides with largest tumor and availability of the corresponding block. RESULTS: Common tumors cases (n = 400), the institutional retrieval rates (all blocks) were 83% (A), 95% (B), 80% (C), and 98% (D). Retrieval rate (tumor blocks) from all centers for common tumors was 73% with mean largest tumor size of 1.49 cm; retrieval (tumor blocks) was highest-lung (84%) and lowest-prostate (54%). Rare tumors cases (n = 400), each institution's retrieval rates (all blocks) were 78% (A), 73% (B), 67% (C), and 84% (D). Retrieval rate (tumor blocks) from all centers for rare tumors was 66% with mean largest tumor size of 1.56 cm; retrieval (tumor blocks) was highest for GIST (72%) and lowest for adenoid cystic carcinoma (58%). CONCLUSION: Assessment shows availability and quality of archival tissue blocks that are retrievable and associated electronic data that can be of value for researchers. This study serves to compliment the data from which uniform use of the SPIN query tools by all four centers will be measured to assure and highlight the usefulness of archival material for obtaining tumor tissues for research

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Blue Documents at the University of Michigan

Primary intraspinal dumbbell-shaped mesenchymal chondrosarcoma with massive calcifications: a case report and review of the literature

Author: A Turpin
AG Huvos
AH Salvador
AM Frezza
AR Harwood
BG Oh
Bolai Chen
BW Scheithauer
CM Malhotra
Dingkun Lin
DM Fletcher
E Danse
E Lee
EA Dowling
EF Vencio
EJ Rushing
FT Namini
G Daita
GS Bae
Guoyi Su
GW Herget
H Ozawa
HC Anderson
HC Selye
HS Chan
J Reif
K Takahashi
L Kreel
L Lightenstein
LA Littrell
LN Di
M Cesari
M Dabska
N Hashimoto
P Bergh
PD Knott
R Raskind
S Kawaguchi
S Tuncer
Shudong Chen
T Iida
TM Dantonello
VV Joshi
Y Nakashima
Yufeng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automated de-identification of free-text medical records

Author: A Goldberger
AH Namini
Andrew Reisner
B Beckwith
B Wellner
D Gupta
G Szarvas
Gari D Clifford
George B Moody
Ishna Neamatullah
JJ Berman
JJ Berman
L Sweeney
L Sweeney
L Sweeney
Li-wei H Lehman
M Saeed
Margaret M Douglass
Mauricio Villarroel
MM Douglass
MM Douglass
MM Douglass
P Ruch
Peter Szolovits
R Miller
RK Taira
Roger G Mark
SM Thomas
T Sibanda
T Sibanda
William J Long
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods: We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results: Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. Conclusion We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.National Institute of Biomedical Imaging and Bioengineering (U.S.)National Institutes of Health (U.S) ( Grant Number R01-EB001659

CiteSeerX

DSpace@MIT

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive