Diagnostic free text analysis in biobanks with CRIP.CodEx: Automated matching of classifications
- Publication date
- Publisher
Abstract
Biobanks represent key resources for biomedical research. To be accessible, e.g. over web-based query tools or trans-institutional metabiobanks, the stored human biospecimens have to be annotated with clinical data, transformed into harmonized and structured form, e.g. ICD codes, while currently only available from free text records. The Biobank under Administration of Human Tissue and Cell Research Foundation HTCR at the University of Munich Medical Centre is routinely collecting remnant tissues and blood samples from treatments of patients. For diagnostic classification of the corresponding cases, a biobank specific classification was developed, but not yet matched to ICD codes. So far done manually, we now used the automated knowledge extraction software CRIP.CodEx, not needing a training set or access to external resources, to recodify the textual description of the specialized HTCR biobank classification with ICD. We show that the information contained in the nomenclature of the individual biobank specific catalogue of diagnoses is sufficient for a mapping towards ICD-10 as well as ICD-O-3 catalogues, and deliver an automated matching of two different classification systems using CRIP.CodEx