HIPAAway: developing software for de-identification
and exploring bias in name detection

Lim, Shulammite

HIPAAway: developing software for de-identification and exploring bias in name detection

Authors: Shulammite Lim
Publication date: 6 June 2023
Publisher: Massachusetts Institute of Technology

Abstract

De-identification, the process of removing identifiers, is a crucial step in the preparation of clinical data for use in biomedical research. Advances in natural language processing have increased interest in developing an accurate and adaptable automatic de-identification system for clinical text. Models for de-identification have been found successful but are largely unavailable for public use due to a lack of provided code and a cost associated with using commercial models. A lack of transparency in deidentification model training may bias the models against certain demographic groups, which are hidden in overall performance metrics and need to be evaluated due to the disproportionate potential harm to marginalized communities. In this thesis, we review current de-identification methods, present a new de-identification dataset, audit demographic biases in existing de-identification approaches, and develop an easy-to-use, open-source de-identification software package. This package would make clinical text de-identification more accessible to researchers and clinicians, alleviating the bottleneck of de-identification to free up more data for biomedical research. This would help make future research more robust and beneficial to not only the medical community, but also people around the world.M.Eng

Similar works

Full text

Available Versions

DSpace@MIT

oai:dspace.mit.edu:1721.1/1513...

Last time updated on 23/01/2024