5 research outputs found
Doctor of Philosophy
dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level
Computational acquisition of knowledge in small-data environments: a case study in the field of energetics
The UK’s defence industry is accelerating its implementation of artificial intelligence, including
expert systems and natural language processing (NLP) tools designed to supplement human
analysis. This thesis examines the limitations of NLP tools in small-data environments (common
in defence) in the defence-related energetic-materials domain. A literature review identifies
the domain-specific challenges of developing an expert system (specifically an ontology). The
absence of domain resources such as labelled datasets and, most significantly, the preprocessing
of text resources are identified as challenges. To address the latter, a novel general-purpose
preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The
effectiveness of the pipeline is evaluated.
Examination of the interface between using NLP tools in data-limited environments to either
supplement or replace human analysis completely is conducted in a study examining the subjective
concept of importance. A methodology for directly comparing the ability of NLP tools
and experts to identify important points in the text is presented. Results show the participants
of the study exhibit little agreement, even on which points in the text are important. The NLP,
expert (author of the text being examined) and participants only agree on general statements.
However, as a group, the participants agreed with the expert. In data-limited environments,
the extractive-summarisation tools examined cannot effectively identify the important points
in a technical document akin to an expert.
A methodology for the classification of journal articles by the technology readiness level (TRL)
of the described technologies in a data-limited environment is proposed. Techniques to overcome
challenges with using real-world data such as class imbalances are investigated. A methodology
to evaluate the reliability of human annotations is presented. Analysis identifies a lack of
agreement and consistency in the expert evaluation of document TRL.Open Acces