5 research outputs found
A scalable and transparent data pipeline for AI-enabled health data ecosystems
IntroductionTransparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.MethodsWe propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.ResultsWe implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for “predicting complications after cardiac surgeries”.DiscussionThrough the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria
Enhancing Mobile Spontaneous Adverse Drug Event Reporting through Electronic Health Records
In this study, we address two major problems of spontaneous reporting systems for adverse drug events (ADEs): underreporting and low report content quality. In the scope of WEB-RADR project, we make use of relevant patient information available in electronic health record (EHR) systems to facilitate ADE reporting process and promote spontaneous reporting on mobile devices. By semi-automatically extracting patient context (e.g. medical history, tests, allergies, drug therapies, etc.) from EHRs, we are able to provide a less time consuming reporting experience and much richer report content. Our tests show that 60% of the patient context residing in E2B reports can easily be extracted using epSOS Patient Summary and Consolidated CDA templates as the EHR source. Proposed implementation can also be extended to integrate further EHR profiles without much effort
Electronic Health Record Standards - A Brief Overview
Most medical information systems store clinical information about patients in proprietary format.,, To address the resulting interoperability problems, several Electronic Health Record (EHR) standards that enable structured clinical content for the purpose of exchange are currently under development. In this article, we present a brief overview of the most relevant EHR standards, examine the level of hiteroperability they provide and assess their functionality in terms of content structure, access services, multimedia support and security
A semantic backend for content management systems
The users of a content repository express the semantics they have in mind while defining the content items and their properties, and forming them into a particular hierarchy. However, this valuable semantics is not formally expressed, and hence cannot be used to discover meaningful relationships among the content items in an automated way. Although the need is apparent, there are several challenges in explicating this semantics in a fully automated way: first, it is difficult to distinguish between data and the metadata in the repository and secondly, not all the metadata defined, such as the file size or encoding type, contribute to the meaning. More importantly, for the developed solution to have practical value, it must address the constraints of the content management system (CMS) industry: CMS industry cannot change their repositories in production use and they need a generic solution not limited to a specific repository architecture. In this article, we address all these challenges through a set of tools developed which first semi-automatically explicate the content repository semantics to a knowledge-base and establish semantic bridges between this backend knowledge-base and the content repository. The repository content is dynamic; to be able to maintain the content repository semantics while new content is created, the changes in the repository semantics are reflected onto the knowledge-base through the semantic bridges. The tool set is complemented with a search engine that make use of the explicated semantics