11 research outputs found
Machine Learning and Clinical Text. Supporting Health Information Flow
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging.
The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality,
and a road map for the technology development.
Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow.
Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising.
Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality.
The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.Siirretty Doriast
Clinical foundations and information architecture for the implementation of a federated health record service
Clinical care increasingly requires healthcare professionals to access patient record information that
may be distributed across multiple sites, held in a variety of paper and electronic formats, and
represented as mixtures of narrative, structured, coded and multi-media entries. A longitudinal
person-centred electronic health record (EHR) is a much-anticipated solution to this problem, but
its realisation is proving to be a long and complex journey.
This Thesis explores the history and evolution of clinical information systems, and establishes a set
of clinical and ethico-legal requirements for a generic EHR server. A federation approach (FHR) to
harmonising distributed heterogeneous electronic clinical databases is advocated as the basis for
meeting these requirements.
A set of information models and middleware services, needed to implement a Federated Health
Record server, are then described, thereby supporting access by clinical applications to a distributed
set of feeder systems holding patient record information. The overall information architecture thus
defined provides a generic means of combining such feeder system data to create a virtual
electronic health record. Active collaboration in a wide range of clinical contexts, across the whole
of Europe, has been central to the evolution of the approach taken.
A federated health record server based on this architecture has been implemented by the author
and colleagues and deployed in a live clinical environment in the Department of Cardiovascular
Medicine at the Whittington Hospital in North London. This implementation experience has fed
back into the conceptual development of the approach and has provided "proof-of-concept"
verification of its completeness and practical utility.
This research has benefited from collaboration with a wide range of healthcare sites, informatics
organisations and industry across Europe though several EU Health Telematics projects: GEHR,
Synapses, EHCR-SupA, SynEx, Medicate and 6WINIT.
The information models published here have been placed in the public domain and have
substantially contributed to two generations of CEN health informatics standards, including CEN
TC/251 ENV 13606
Extracção de informação médica em português europeu
Doutoramento em Engenharia InformáticaThe electronic storage of medical patient data is becoming a daily experience
in most of the practices and hospitals worldwide. However, much of the data
available is in free-form text, a convenient way of expressing concepts and
events, but especially challenging if one wants to perform automatic searches,
summarization or statistical analysis. Information Extraction can relieve some of
these problems by offering a semantically informed interpretation and
abstraction of the texts.
MedInX, the Medical Information eXtraction system presented in this document,
is the first information extraction system developed to process textual clinical
discharge records written in Portuguese. The main goal of the system is to
improve access to the information locked up in unstructured text, and,
consequently, the efficiency of the health care process, by allowing faster and
reliable access to quality information on health, for both patient and health
professionals.
MedInX components are based on Natural Language Processing principles,
and provide several mechanisms to read, process and utilize external
resources, such as terminologies and ontologies, in the process of automatic
mapping of free text reports onto a structured representation.
However, the flexible and scalable architecture of the system, also allowed its
application to the task of Named Entity Recognition on a shared evaluation
contest focused on Portuguese general domain free-form texts.
The evaluation of the system on a set of authentic hospital discharge letters
indicates that the system performs with 95% F-measure, on the task of entity
recognition, and 95% precision on the task of relation extraction.
Example applications, demonstrating the use of MedInX capabilities in real
applications in the hospital setting, are also presented in this document. These
applications were designed to answer common clinical problems related with
the automatic coding of diagnoses and other health-related conditions
described in the documents, according to the international classification
systems ICD-9-CM and ICF. The automatic review of the content and
completeness of the documents is an example of another developed
application, denominated MedInX Clinical Audit system.O armazenamento electrónico dos dados médicos do paciente é uma prática
cada vez mais comum nos hospitais e clínicas médicas de todo o mundo. No
entanto, a maior parte destes dados são disponibilizados sob a forma de texto
livre, uma forma conveniente de expressar conceitos e termos mas
particularmente desafiante quando se pretende realizar procuras, sumarização
ou análise estatística de uma forma automática. As tecnologias de extracção
automática de informação podem ajudar a solucionar alguns destes problemas
através da interpretação semântica e da abstracção do conteúdo dos textos.
O sistema de Extracção de Informação Médica apresentado neste documento,
o MedInX, é o primeiro sistema desenvolvido para o processamento de cartas
de alta hospitalar escritas em Português. O principal objectivo do sistema é a
melhoria do acesso à informação trancada nos textos e, consequentemente, a
melhoria da eficiência dos cuidados de saúde, através do acesso rápido e
confiável à informação, quer relativa ao doente, quer aos profissionais de
saúde.
O MedInX utiliza diversas componentes, baseadas em princípios de
processamento de linguagem natural, para a análise dos textos clínicos, e
contém vários mecanismos para ler, processar e utilizar recursos externos,
como terminologias e ontologias. Este recursos são utilizados, em particular,
no mapeamento automático do texto livre para uma representação estruturada.
No entanto, a arquitectura flexível e escalável do sistema permitiu, também, a
sua aplicação na tarefa de Reconhecimento de Entidades Nomeadas numa
avaliação conjunta relativa ao processamento de textos de domínio geral,
escritos em Português.
A avaliação do sistema num conjunto de cartas de alta hospitalar reais, indica
que o sistema realiza a tarefa de extracção de informação com uma medida F
de 95% e a tarefa de extracção de relações com uma precisão de 95%.
A utilidade do sistema em aplicações reais é demonstrada através do
desenvolvimento de um conjunto de projectos exemplificativos, que pretendem
responder a problemas concretos e comuns em ambiente hospitalar. Estes
problemas estão relacionados com a codificação automática de diagnósticos e
de outras condições relacionadas com o estado de saúde do doente, seguindo
as classificações internacionais, ICD-9-CM e ICF. A revisão automática do
conteúdo dos documentos é outro exemplo das possíveis aplicações práticas
do sistema. Esta última aplicação é representada pelo o sistema de auditoria
do MedInX
Real-time classifiers from free-text for continuous surveillance of small animal disease
A wealth of information of epidemiological importance is held within unstructured narrative clinical records. Text mining provides computational techniques for extracting usable information from the language used to communicate between humans, including the spoken and written word. The aim of this work was to develop text-mining methodologies capable of rendering the large volume of information within veterinary clinical narratives accessible for research and surveillance purposes. The free-text records collated within the dataset of the Small Animal Veterinary Surveillance Network formed the development material and target of this work. The efficacy of pre-existent clinician-assigned coding applied to the dataset was evaluated and the nature of notation and vocabulary used in documenting consultations was explored and described. Consultation records were pre-processed to improve human and software readability, and software was developed to redact incidental identifiers present within the free-text. An automated system able to classify for the presence of clinical signs, utilising only information present within the free-text record, was developed with the aim that it would facilitate timely detection of spatio-temporal trends in clinical signs. Clinician-assigned main reason for visit coding provided a poor summary of the large quantity of information exchanged during a veterinary consultation and the nature of the coding and questionnaire triggering further obfuscated information. Delineation of the previously undocumented veterinary clinical sublanguage identified common themes and their manner of documentation, this was key to the development of programmatic methods. A rule-based classifier using logically-chosen dictionaries, sequential processing and data-masking redacted identifiers while maintaining research usability of records. Highly sensitive and specific free-text classification was achieved by applying classifiers for individual clinical signs within a context-sensitive scaffold, this permitted or prohibited matching dependent on the clinical context in which a clinical sign was documented. The mean sensitivity achieved within an unseen test dataset was 98.17 (74.47, 99.9)% and mean specificity 99.94 (77.1, 100.0)%. When used in combination to identify animals with any of a combination of gastrointestinal clinical signs, the sensitivity achieved was 99.44% (95% CI: 98.57, 99.78)% and specificity 99.74 (95% CI: 99.62, 99.83). This work illustrates the importance, utility and promise of free-text classification of clinical records and provides a framework within which this is possible whilst respecting the confidentiality of client and clinician
Recommended from our members
A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives
In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework