3 research outputs found
Text mining patient experiences from online health communities
Social media has had an impact on how patients experience healthcare. Through online channels, patients are sharing information and their experiences with potentially large audiences all over the world. While sharing in this way may offer immediate benefits to themselves and their readership (e.g. other patients) these unprompted, self-authored accounts of illness are also an important resource for healthcare researchers. They offer unprecedented insight into understanding patients’experience of illness. Work has been undertaken through qualitative analysis in order to explore this source of data and utilising the information expressed through these media. However, the manual nature of the analysis means that scope is limited to a small proportion of the hundreds of thousands of authors who are creating content.
In our research, we aim to explore utilising text mining to support traditional qualitative analysis of this data. Text mining uses a number of processes in order to extract useful facts from text and analyse patterns within – the ultimate aim is to generate new knowledge by analysing textual data en mass.
We developed QuTiP – a Text Mining framework which can enable large scale qualitative analyses of patient narratives shared over social media. In this thesis, we describe QuTiP and our application of the framework to analyse the accounts of patients living with chronic lung disease. As well as a qualitative analysis, we describe our approaches to automated information extraction, term recognition and text classification in order to automatically extract relevant information from blog post data. Within the QuTiP framework, these individual automated approaches can be brought together to support further analyses of large social media datasets
Recommended from our members
A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives
In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework