6 research outputs found
A Multidimensional Data Warehouse for Community Health Centers
Community health centers (CHCs) play a pivotal role in healthcare delivery to vulnerable populations, but have not yet benefited from a data warehouse that can support improvements in clinical and financial outcomes across the practice. We have developed a multidimensional clinic data warehouse (CDW) by working with 7 CHCs across the state of Indiana and integrating their operational, financial and electronic patient records to support ongoing delivery of care. We describe in detail the rationale for the project, the data architecture employed, the content of the data warehouse, along with a description of the challenges experienced and strategies used in the development of this repository that may help other researchers, managers and leaders in health informatics. The resulting multidimensional data warehouse is highly practical and is designed to provide a foundation for wide-ranging healthcare data analytics over time and across the community health research enterprise
Doctor of Philosophy
dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone
Recommended from our members
Toward a Generalized Model of Biomedical Query Mediation to Improve Electronic Health Record Data Retrieval
The electronic health record (EHR) is an invaluable resource for medical knowledge discovery. EHR data interrogation requires significant medical and technical knowledge. To access EHR data, medical researchers often rely on query analysts to translate their EHR information needs into EHR database queries. The conversation between the medical researcher and the query analyst is an information needs negotiation; I have named this process biomedical query mediation (BQM). There exists no BQM standard to guide medical researchers and query analysts to effectively bridge the communication gap between these medical and technical experts. The current practice of BQM likely varies among query analysts. This variation may contribute to the delivery of EHR data sets with varying degrees of accuracy. For example, a query analyst may return an EHR dataset that misrepresents the medical researcher’s information need or another query analyst may return a different EHR dataset to the medical researcher for the same information need. The process used to formulate the medical researcher’s information need and translate that need into an executable EHR database query may have severe downstream consequences affecting the reliability and quality of EHR datasets for medical research. This dissertation contributes early understandings of the BQM process and thereby improves the transparency and highlights the complexity of BQM by completing five studies: 1) survey the literature from other information intensive scientific disciplines to identify knowledge and methods potentially useful for BQM, 2) perform a review of existing tools and forms for assisting researchers in BQM, 3) perform a content analysis of the BQM process, 4) conduct a cognitive task analysis to detail a generalized workflow, and 5) develop an enriched concept schema to capture comprehensive EHR data needs. This dissertation employs extensive qualitative methods using grounded theory, expert interviews, and cognitive task analysis to produce a deep understanding of BQM. Additionally, I contribute a promising concept class schema to represent medical researchers’ EHR data needs to help standardize the BQM process
Clinical foundations and information architecture for the implementation of a federated health record service
Clinical care increasingly requires healthcare professionals to access patient record information that
may be distributed across multiple sites, held in a variety of paper and electronic formats, and
represented as mixtures of narrative, structured, coded and multi-media entries. A longitudinal
person-centred electronic health record (EHR) is a much-anticipated solution to this problem, but
its realisation is proving to be a long and complex journey.
This Thesis explores the history and evolution of clinical information systems, and establishes a set
of clinical and ethico-legal requirements for a generic EHR server. A federation approach (FHR) to
harmonising distributed heterogeneous electronic clinical databases is advocated as the basis for
meeting these requirements.
A set of information models and middleware services, needed to implement a Federated Health
Record server, are then described, thereby supporting access by clinical applications to a distributed
set of feeder systems holding patient record information. The overall information architecture thus
defined provides a generic means of combining such feeder system data to create a virtual
electronic health record. Active collaboration in a wide range of clinical contexts, across the whole
of Europe, has been central to the evolution of the approach taken.
A federated health record server based on this architecture has been implemented by the author
and colleagues and deployed in a live clinical environment in the Department of Cardiovascular
Medicine at the Whittington Hospital in North London. This implementation experience has fed
back into the conceptual development of the approach and has provided "proof-of-concept"
verification of its completeness and practical utility.
This research has benefited from collaboration with a wide range of healthcare sites, informatics
organisations and industry across Europe though several EU Health Telematics projects: GEHR,
Synapses, EHCR-SupA, SynEx, Medicate and 6WINIT.
The information models published here have been placed in the public domain and have
substantially contributed to two generations of CEN health informatics standards, including CEN
TC/251 ENV 13606