5,412 research outputs found

    Automatic end-to-end De-identification: Is high accuracy the only metric?

    Get PDF
    De-identification of electronic health records (EHR) is a vital step towards advancing health informatics research and maximising the use of available data. It is a two-step process where step one is the identification of protected health information (PHI), and step two is replacing such PHI with surrogates. Despite the recent advances in automatic de-identification of EHR, significant obstacles remain if the abundant health data available are to be used to the full potential. Accuracy in de-identification could be considered a necessary, but not sufficient condition for the use of EHR without individual patient consent. We present here a comprehensive review of the progress to date, both the impressive successes in achieving high accuracy and the significant risks and challenges that remain. To best of our knowledge, this is the first paper to present a complete picture of end-to-end automatic de-identification. We review 18 recently published automatic de-identification systems -designed to de-identify EHR in the form of free text- to show the advancements made in improving the overall accuracy of the system, and in identifying individual PHI. We argue that despite the improvements in accuracy there remain challenges in surrogate generation and replacements of identified PHIs, and the risks posed to patient protection and privacy

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio

    Information representation in Displaced Archives: a meta-synthesis

    Get PDF
    The phenomenon of displaced archives emerges in the scientific literature as a kind of wicked problem. In addition to the conceptual diversity associated with this phenomenon present in the scientific literature from various fields (e.g., removed archives, migrated archives, seized archives, alienated archives, captured archives, diasporic archives, expatriated/repatriated archives, estray archival, disputed archival claims, replevin, etc.), there is a tendency in the scientific community to adopt the concept of displaced archives as a possible hypernym. The most recent definition of displaced archive is found in the report issued by the International Council on Archives through the Expert Group on Shared Archival Heritage, understood as "archives removed from the place of their creation, where the ownership of the archives is disputed by two or more parties" (Lowry 2020, 5). Nevertheless, one of the problems that makes this phenomenon a particular case is whether we can identify archives that are in the condition of displaced without there necessarily having to be claimant party(ies) for that purpose. Although the most recurrent focus in addressing this phenomenon has been around the problems of restitution, repatriation, return or relocation, the identification and, more incisively, the representation of these documentary sets have remained obscured in scientific discourse. According to Winn (2015), one of the limiting factors in the identification of displaced archives consists, among others, in the inexistence of information access tools. For Lowry, "the catalogue is the key" (2017a, 8), not only as an instrument of access to information where the processes of organization and description are materialized with a view to its retrieval and access, but also as a mechanism of information representation that derives from the powers of archival mediation. Studies on information representation suggest that, in the postmodern archival stream, it is not possible to ensure neutrality or impartiality in the representation of the content and structure of a fonds (MacNeil 2012) in finding aids. Such archival descriptions are supported by interpretative approaches that depend on the description and access policies adopted by custodians, which are not unrelated to the political, historical, socio-cultural, and institutional contexts of the environment where they were produced. Considering that the finding aids can be genologically diverse (e.g., catalogues, inventories, guides, scripts, directories, indexes and databases), it is considered more productive to focus on the representation of archival information, from the perspective of how a "fluid, evolving, and socially constructed practice" (Yakel 2003, 2) is constituted as "the core of archival description produced to facilitate access to archival materials in the background of their creation and custodial history" (Zhang 2012, 49). Considering that some of the studies on archival information representation have been problematized with greater incidence, although incipient, in Knowledge Organization and Information Science (Barros and Sousa 2020; Aguiar and Kobashi 2013; Tognoli 2013; Vital, Medeiros, and Brascher 2017; Corujo and Freitas 2021; Tognoli and Guimarães 2011; 2012; Troitiño Rodriguez 2018; Hjørland 2002), these studies have largely confined themselves to material and technical processes, physical (u. g., arrangement) and intellectual (u. g., classification and description), of concepts that conform to the bureaucratic dimension of the producers and/or custodial entities. In what concerns the displaced archives, the phenomenon itself challenges the core concepts of Archival Science, especially how these disputed documentary sets are represented from the point of view of provenance, integrity, organicity and how these representations are (re)constructed or destroyed in the process of archival mediation. Based on these aspects, and given the scarcity of studies on this topic, this article focuses on how the representation of information about archives removed from their original social and territorial contexts has been addressed in the scientific literature. Thus, we intend to conduct a survey of scientific literature that informs about the trajectory of the information representation process from the removal process to the claim by the dispossessed communities that can be theoretically relevant to the scope of Knowledge Organization. Thus, based on the above, it justifies performing a synthesis of knowledge from scientific literature called meta-synthesis (Sandelowski and Barroso 2010; Grant and Booth 2009; Finfgeld-Connett 2018). Thus, this article is structured as follows: section 2.0 formulates the starting question and research objectives; section 3.0 outlines the methodological assumptions for this type of qualitative literature synthesis; section 4.0 presents the results of the empirical investigation; section 5.0 makes concluding remarks around limitations and implications, as well as future lines of research.info:eu-repo/semantics/submittedVersio

    Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress

    Get PDF
    Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research

    Effect of Metal Speciation in Fly Ash on Environmentally Persistent Free Radical Formation

    Get PDF
    Waste incineration and Superfund sites lead to the formation of pollutants harmful to both the environment and human health. Environmentally Persistent Free Radicals (EPFRs) are a class of newly discovered radical pollutants known to form on combustion by-products such as fly ash and particulate matter. EPFRs are rapidly gaining attention for their harmful effects on the environment and human health. Previous research has shown the formation of EPFRs through surface-mediated reactions with transition metal-oxides on particulates. The work presented in this dissertation explores the relationship between fly ash composition and EPFR formation. Fly ash production occurs from combustion systems, namely waste incinerators. Waste composition varies widely throughout the globe, which changes the composition of fly ash and its associated pollutants. In Chapter 4, the thorough characterization of real world fly ashes from municipal and medical waste incinerators is explored and includes EPFR concentrations, elemental composition and particle characterization. EPFR variability found among real world fly ashes is heavily influenced by fly ash composition. Based on the findings from the real world fly ashes, a model system was developed to further understand the role of sulfur in the formation of EPFRs and is described in Chapter 5. Sulfur species, in the form of ammonium sulfate and sulfur dioxide, have a major influence on EPFR formation. In Chapter 6, a remediation method for mining influenced water using Chitin and sulfate-reducing bacteria is presented. The work done at LSU confirmed sulfur reduction, which indicated a successful remediation of heavy metals and sulfates from the water
    corecore