100 research outputs found

    Automatic de-identification of textual documents in the electronic health record: a review of recent research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here.</p> <p>Methods</p> <p>This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers.</p> <p>Results</p> <p>The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries.</p> <p>Conclusions</p> <p>In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

    De-identification of primary care electronic medical records free-text data in Ontario, Canada

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data</p> <p>Methods</p> <p>We used <it>deid </it>open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.</p> <p>Results</p> <p>We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.</p> <p>Conclusion</p> <p>The <it>deid </it>program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.</p

    EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

    Get PDF
    BACKGROUND: A better understanding of the mechanisms of an enzyme's functionality and stability, as well as knowledge and impact of mutations is crucial for researchers working with enzymes. Though, several of the enzymes' databases are currently available, scientific literature still remains at large for up-to-date source of learning the effects of a mutation on an enzyme. However, going through vast amounts of scientific documents to extract the information on desired mutation has always been a time consuming process. In this paper, therefore, we describe an unique method, termed as EnzyMiner, which automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability and/or the activity of a given enzyme. RESULTS: We present an automated system which identifies the abstracts that contain an amino-acid-level mutation and then classifies them according to the mutation's effect on the enzyme. In the case of mutation identification, MuGeX, an automated mutation-gene extraction system has an accuracy of 93.1% with a 91.5 F-measure. For impact analysis, document classification is performed to identify the abstracts that contain a change in enzyme's stability or activity resulting from the mutation. The system was trained on lipases and tested on amylases with an accuracy of 85%. CONCLUSION: EnzyMiner identifies the abstracts that contain a protein mutation for a given enzyme and checks whether the abstract is related to a disease with the help of information extraction and machine learning techniques. For disease related abstracts, the mutation list and direct links to the abstracts are retrieved from the system and displayed on the Web. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality followed by displaying these on the web

    First-line therapy in atypical hemolytic uremic syndrome: consideration on infants with a poor prognosis.

    Get PDF
    BackgroundAtypical hemolytic uremic syndrome (aHUS) is a rare and heterogeneous disorder. The first line treatment of aHUS is plasma therapy, but in the past few years, the recommendations have changed greatly with the advent of eculizumab, a humanized monoclonal anti C5-antibody. Although recent recommendations suggest using it as a primary treatment for aHUS, important questions have arisen about the necessity of immediate use of eculizumab in all cases. We aimed to draw attention to a specific subgroup of aHUS patients with rapid disease progression and high mortality, in whom plasma therapy may not be feasible.MethodsWe present three pediatric patients of acute complement-mediated HUS with a fatal outcome. Classical and alternative complement pathway activity, levels of complement factors C3, C4, H, B and I, as well as of anti-factor H autoantibody and of ADAMTS13 activity were determined. The coding regions of CFH, CFI, CD46, THBD, CFB and C3 genes were sequenced and the copy number of CFI, CD46, CFH and related genes were analyzed.ResultsWe found severe activation and consumption of complement components in these patients, furthermore, in one patient we identified a previously not reported mutation in CFH (Ser722Stop), supporting the diagnosis of complement-mediated HUS. These patients were not responsive to the FFP therapy, and all cases had fatal outcome.ConclusionTaking the heterogeneity and the variable prognosis of atypical HUS into account, we suggest that the immediate use of eculizumab should be considered as first-line therapy in certain small children with complement dysregulation

    Prognostic Impacts of Angiopoietins in NSCLC Tumor Cells and Stroma: VEGF-A Impact Is Strongly Associated with Ang-2

    Get PDF
    INTRODUCTION: Angiopoietins and their receptor Tie-2 are, in concert with VEGF-A, key mediators in angiogenesis. This study evaluates the prognostic impact of all known human angiopoietins (Ang-1, Ang-2 and Ang-4) and their receptor Tie-2, as well as their relation to the prognostic expression of VEGF-A. METHODS: 335 unselected stage I-IIIA NSCLC-patients were included and tissue samples of respective tumor cells and stroma were collected in tissue microarrays (TMAs). Immunohistochemistry (IHC) was used to semiquantitatively evaluate the expression of markers in duplicate tumor and stroma cores. PRINCIPAL FINDINGS: In univariate analyses, low tumor cell expression of Ang-4 (P = 0.046) and low stromal expressions of Ang-4 (P = 0.009) and Ang-2 (P = 0.017) were individually associated with a poor survival. In the multivariate analysis, low stromal Ang-2 (HR 1.88; CI 95% 1.15-3.08) and Ang-4 (HR 1.47, CI 95% 1.02-2.11, P = 0.04) expressions were independently associated with a poor prognosis. In patients with high tumor cell expression of Ang-2, a concomitantly high tumor VEGF-A expression mediated a dramatic survival reduction (P<0.001). In the multivariate analysis of patients with high Ang-2 expression, high tumor VEGF-A expression appeared an independent poor prognosticator (HR 6.43; CI 95% 2.46-16.8; P<0.001). CONCLUSIONS: In tumor cells, only Ang-4 expression has prognostic impact in NSCLC. In tumor stroma, Ang-4 and Ang-2 are independently associated with survival. The prognostic impact of tumor cell VEGF-A in NSCLC appears strongly associated with a concomitantly high tumor cell expression of Ang-2

    Angiogenesis Markers Quantification in Breast Cancer and Their Correlation with Clinicopathological Prognostic Variables

    Get PDF
    Tumoural angiogenesis is essential for the growth and spread of breast cancer cells. Therefore the aim of this study was to assess the diagnostic performance of angiogenesis markers in tumours and there reflecting levels in serum of breast cancer patients. Angiogenin, Ang2, fibroblast growth factor basic, intercellular adhesion molecule (ICAM)-1, keratinocyte growth factor (KGF), platelet-derived growth factor-BB, and VEGF-A were measured using a FASTQuant angiogenic growth factor multiplex protein assay. We observed that breast cancer tumours exhibited high levels of PDGF-BB, bFGF and VEGF, and extremely high levels of TIMP-1 and Ang-2, whereas in serum we found significantly higher levels of Ang-2, PDGF-BB, bFGF, ICAM-1 and VEGF in patients with breast cancer compared to the benign breast diseases patients. Moreover, some of these angiogenesis markers evaluated in tumour and serum of breast cancer patients exhibited association with standard clinical parameters, ER status as well as MVD of tumours. Angiogenesis markers play important roles in tumour growth, invasion and metastasis. Our results suggest that analysis of angiogenesis markers in tumour and serum of breast cancer patients using multiplex protein assay can improve diagnosis and prognosis in this diseases
    corecore