264 research outputs found
Recommended from our members
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods
Design and implementation of serverless architecture for i2b2 on AWS cloud and Snowflake data warehouse
Informatics for Integrating Biology and the Beside (i2b2) is an open-source medical tool for cohort discovery that allows researchers to explore and query clinical data. The i2b2 platform is designed to adopt any patient-centric data models and used at over 400 healthcare institutions worldwide for querying patient data. The platform consists of a webclient, core servers and database. Despite having installation guidelines, the complex architecture of the system with numerous dependencies and configuration parameters makes it difficult to install a functional i2b2 platform. On the other hand, maintaining the scalability, security, availability of the application is also challenging and requires lot of resources. Our aim was to deploy the i2b2 for University of Missouri (UM) System in the cloud as well as reduce the complexity and effort of the installation and maintenance process. Our solution encapsulated the complete installation process of each component using docker and deployed the container in the AWS Virtual Private Cloud (VPC) using several AWS PaaS (Platform as a Service), IaaS (Infrastructure as a Service) services. We deployed the application as a service in the AWS FARGATE, an on-demand, serverless, auto scalable compute engine. We also enhanced the functionality of i2b2 services and developed Snowflake JDBC driver support for i2b2 backend services. It enabled i2b2 services to query directly from Snowflake analytical database. In addition, we also created i2b2-data-installer package to load PCORnet CDM and ACT ontology data into i2b2 database. The i2b2 platform in University of Missouri holds 1.26B facts of 2.2M patients of UM Cerner Millennium data.Includes bibliographical references
Evaluating the informatics for integrating biology and the bedside system for clinical research
pre-printBackground: Selecting patient cohorts is a critical, iterative, and often time-consuming aspect of studies involving human subjects; informatics tools for helping streamline the process have been identified as important infrastructure components for enabling clinical and translational research. We describe the evaluation of a free and open source cohort selection tool from the Informatics for Integrating Biology and the Bedside (i2b2) group: the i2b2 hive. Methods: Our evaluation included the usability and functionality of the i2b2 hive using several real world examples of research data requests received electronically at the University of Utah Health Sciences Center between 2006 - 2008. The hive server component and the visual query tool application were evaluated for their suitability as a cohort selection tool on the basis of the types of data elements requested, as well as the effort required to fulfill each research data request using the i2b2 hive alone. Results: We found the i2b2 hive to be suitable for obtaining estimates of cohort sizes and generating research cohorts based on simple inclusion/exclusion criteria, which consisted of about 44% of the clinical research data requests sampled at our institution. Data requests that relied on post-coordinated clinical concepts, aggregate values of clinical findings, or temporal conditions in their inclusion/exclusion criteria could not be fulfilled using the i2b2 hive alone, and required one or more intermediate data steps in the form of pre-or post-processing, modifications to the hive metadata, etc. Conclusion: The i2b2 hive was found to be a useful cohort-selection tool for fulfilling common types of requests for research data, and especially in the estimation of initial cohort sizes. For another institution that might want to use the i2b2 hive for clinical research, we recommend that the institution would need to have structured, coded clinical data and metadata available that can be transformed to fit the logical data models of the i2b2 hive, strategies for extracting relevant clinical data from source systems, and the ability to perform substantial pre- and post-processing of these data
The Role of Free/Libre and Open Source Software in Learning Health Systems
OBJECTIVE: To give an overview of the role of Free/Libre and Open Source Software (FLOSS) in the context of secondary use of patient data to enable Learning Health Systems (LHSs). METHODS: We conducted an environmental scan of the academic and grey literature utilising the MedFLOSS database of open source systems in healthcare to inform a discussion of the role of open source in developing LHSs that reuse patient data for research and quality improvement. RESULTS: A wide range of FLOSS is identified that contributes to the information technology (IT) infrastructure of LHSs including operating systems, databases, frameworks, interoperability software, and mobile and web apps. The recent literature around the development and use of key clinical data management tools is also reviewed. CONCLUSIONS: FLOSS already plays a critical role in modern health IT infrastructure for the collection, storage, and analysis of patient data. The nature of FLOSS systems to be collaborative, modular, and modifiable may make open source approaches appropriate for building the digital infrastructure for a LHS.</p
Evaluating the informatics for integrating biology and the bedside system for clinical research
<p>Abstract</p> <p>Background</p> <p>Selecting patient cohorts is a critical, iterative, and often time-consuming aspect of studies involving human subjects; informatics tools for helping streamline the process have been identified as important infrastructure components for enabling clinical and translational research. We describe the evaluation of a free and open source cohort selection tool from the Informatics for Integrating Biology and the Bedside (i2b2) group: the i2b2 hive.</p> <p>Methods</p> <p>Our evaluation included the usability and functionality of the i2b2 hive using several real world examples of research data requests received electronically at the University of Utah Health Sciences Center between 2006 - 2008. The hive server component and the visual query tool application were evaluated for their suitability as a cohort selection tool on the basis of the types of data elements requested, as well as the effort required to fulfill each research data request using the i2b2 hive alone.</p> <p>Results</p> <p>We found the i2b2 hive to be suitable for obtaining estimates of cohort sizes and generating research cohorts based on simple inclusion/exclusion criteria, which consisted of about 44% of the clinical research data requests sampled at our institution. Data requests that relied on post-coordinated clinical concepts, aggregate values of clinical findings, or temporal conditions in their inclusion/exclusion criteria could not be fulfilled using the i2b2 hive alone, and required one or more intermediate data steps in the form of pre- or post-processing, modifications to the hive metadata, etc.</p> <p>Conclusion</p> <p>The i2b2 hive was found to be a useful cohort-selection tool for fulfilling common types of requests for research data, and especially in the estimation of initial cohort sizes. For another institution that might want to use the i2b2 hive for clinical research, we recommend that the institution would need to have structured, coded clinical data and metadata available that can be transformed to fit the logical data models of the i2b2 hive, strategies for extracting relevant clinical data from source systems, and the ability to perform substantial pre- and post-processing of these data.</p
Recommended from our members
SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies
Results of medical research studies are often contradictory or cannot be reproduced. One reason is that there may not be enough patient subjects available for observation for a long enough time period. Another reason is that patient populations may vary considerably with respect to geographic and demographic boundaries thus limiting how broadly the results apply. Even when similar patient populations are pooled together from multiple locations, differences in medical treatment and record systems can limit which outcome measures can be commonly analyzed. In total, these differences in medical research settings can lead to differing conclusions or can even prevent some studies from starting. We thus sought to create a patient research system that could aggregate as many patient observations as possible from a large number of hospitals in a uniform way. We call this system the ‘Shared Health Research Information Network’, with the following properties: (1) reuse electronic health data from everyday clinical care for research purposes, (2) respect patient privacy and hospital autonomy, (3) aggregate patient populations across many hospitals to achieve statistically significant sample sizes that can be validated independently of a single research setting, (4) harmonize the observation facts recorded at each institution such that queries can be made across many hospitals in parallel, (5) scale to regional and national collaborations. The purpose of this report is to provide open source software for multi-site clinical studies and to report on early uses of this application. At this time SHRINE implementations have been used for multi-site studies of autism co-morbidity, juvenile idiopathic arthritis, peripartum cardiomyopathy, colorectal cancer, diabetes, and others. The wide range of study objectives and growing adoption suggest that SHRINE may be applicable beyond the research uses and participating hospitals named in this report
An Evaluation of the Use of a Clinical Research Data Warehouse and I2b2 Infrastructure to Facilitate Replication of Research
Replication of clinical research is requisite for forming effective clinical decisions and guidelines. While rerunning a clinical trial may be unethical and prohibitively expensive, the adoption of EHRs and the infrastructure for distributed research networks provide access to clinical data for observational and retrospective studies. Herein I demonstrate a means of using these tools to validate existing results and extend the findings to novel populations. I describe the process of evaluating published risk models as well as local data and infrastructure to assess the replicability of the study. I use an example of a risk model unable to be replicated as well as a study of in-hospital mortality risk I replicated using UNMC’s clinical research data warehouse.
In these examples and other studies we have participated in, some elements are commonly missing or under-developed. One such missing element is a consistent and computable phenotype for pregnancy status based on data recorded in the EHR. I survey local clinical data and identify a number of variables correlated with pregnancy as well as demonstrate the data required to identify the temporal bounds of a pregnancy episode. Next, another common obstacle to replicating risk models is the necessity of linking to alternative data sources while maintaining data in a de-identified database. I demonstrate a pipeline for linking clinical data to socioeconomic variables and indices obtained from the American Community Survey (ACS). While these data are location-based, I provide a method for storing them in a HIPAA compliant fashion so as not to identify a patient’s location.
While full and efficient replication of all clinical studies is still a future goal, the demonstration of replication as well as beginning the development of a computable phenotype for pregnancy and the incorporation of location based data in a de-identified data warehouse demonstrate how the EHR data and a research infrastructure may be used to facilitate this effort
Recommended from our members
Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture
We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative ‘apps’ to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components
- …