125 research outputs found
An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences
Indiana University-Purdue University Indianapolis (IUPUI)The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed
Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems
This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 P
Advanced machine learning methods for oncological image analysis
Cancer is a major public health problem, accounting for an estimated 10 million deaths worldwide in 2020 alone. Rapid advances in the field of image acquisition and hardware development over the past three decades have resulted in the development of modern medical imaging modalities that can capture high-resolution anatomical, physiological, functional, and metabolic quantitative information from cancerous organs. Therefore, the applications of medical imaging have become increasingly crucial in the clinical routines of oncology, providing screening, diagnosis, treatment monitoring, and non/minimally- invasive evaluation of disease prognosis. The essential need for medical images, however, has resulted in the acquisition of a tremendous number of imaging scans. Considering the growing role of medical imaging data on one side and the challenges of manually examining such an abundance of data on the other side, the development of computerized tools to automatically or semi-automatically examine the image data has attracted considerable interest. Hence, a variety of machine learning tools have been developed for oncological image analysis, aiming to assist clinicians with repetitive tasks in their workflow.
This thesis aims to contribute to the field of oncological image analysis by proposing new ways of quantifying tumor characteristics from medical image data. Specifically, this thesis consists of six studies, the first two of which focus on introducing novel methods for tumor segmentation. The last four studies aim to develop quantitative imaging biomarkers for cancer diagnosis and prognosis.
The main objective of Study I is to develop a deep learning pipeline capable of capturing the appearance of lung pathologies, including lung tumors, and integrating this pipeline into the segmentation networks to leverage the segmentation accuracy. The proposed pipeline was tested on several comprehensive datasets, and the numerical quantifications show the superiority of the proposed prior-aware DL framework compared to the state of the art. Study II aims to address a crucial challenge faced by supervised segmentation models: dependency on the large-scale labeled dataset. In this study, an unsupervised segmentation approach is proposed based on the concept of image inpainting to segment lung and head- neck tumors in images from single and multiple modalities. The proposed autoinpainting pipeline shows great potential in synthesizing high-quality tumor-free images and outperforms a family of well-established unsupervised models in terms of segmentation accuracy.
Studies III and IV aim to automatically discriminate the benign from the malignant pulmonary nodules by analyzing the low-dose computed tomography (LDCT) scans. In Study III, a dual-pathway deep classification framework is proposed to simultaneously take into account the local intra-nodule heterogeneities and the global contextual information. Study IV seeks to compare the discriminative power of a series of carefully selected conventional radiomics methods, end-to-end Deep Learning (DL) models, and deep features-based radiomics analysis on the same dataset. The numerical analyses show the potential of fusing the learned deep features into radiomic features for boosting the classification power.
Study V focuses on the early assessment of lung tumor response to the applied treatments by proposing a novel feature set that can be interpreted physiologically. This feature set was employed to quantify the changes in the tumor characteristics from longitudinal PET-CT scans in order to predict the overall survival status of the patients two years after the last session of treatments. The discriminative power of the introduced imaging biomarkers was compared against the conventional radiomics, and the quantitative evaluations verified the superiority of the proposed feature set. Whereas Study V focuses on a binary survival prediction task, Study VI addresses the prediction of survival rate in patients diagnosed with lung and head-neck cancer by investigating the potential of spherical convolutional neural networks and comparing their performance against other types of features, including radiomics. While comparable results were achieved in intra- dataset analyses, the proposed spherical-based features show more predictive power in inter-dataset analyses.
In summary, the six studies incorporate different imaging modalities and a wide range of image processing and machine-learning techniques in the methods developed for the quantitative assessment of tumor characteristics and contribute to the essential procedures of cancer diagnosis and prognosis
Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease
Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical.
This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest.
The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge
Sentence Simplification for Text Processing
A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which
affect the ability of humans and machines to process them effectively. In this
thesis, I present a new approach to automatic sentence simplification which processes
sentences containing compound clauses and complex noun phrases (NPs)
and converts them into sequences of simple sentences which contain fewer of these
constituents and have reduced per sentence propositional density and syntactic
complexity.
My overall approach is iterative and relies on both machine learning and handcrafted
rules. It implements a small set of sentence transformation schemes, each
of which takes one sentence containing compound clauses or complex NPs and
converts it one or two simplified sentences containing fewer of these constituents
(Chapter 5). The iterative algorithm applies the schemes repeatedly and is able
to simplify sentences which contain arbitrary numbers of compound clauses and
complex NPs. The transformation schemes rely on automatic detection of these
constituents, which may take a variety of forms in input sentences. In the thesis, I
present two new shallow syntactic analysis methods which facilitate the detection
process.
The first of these identifies various explicit signs of syntactic complexity in
input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and
evaluate this sign tagger (Chapter 2) and the machine learning method used to
implement it (Chapter 3). The second syntactic analysis method exploits the sign
tagger and identifies the spans of compound clauses and complex NPs in input
sentences. In Chapter 4 of the thesis, I describe the development and evaluation
of a machine learning approach performing this task. This chapter also presents
a new annotated dataset supporting this activity.
In the thesis, I present two implementations of my approach to sentence simplification.
One of these exploits handcrafted rule activation patterns to detect
different parts of input sentences which are relevant to the simplification process.
The other implementation uses my machine learning method to identify
compound clauses and complex NPs for this purpose.
Intrinsic evaluation of the two implementations is presented in Chapter 6 together
with a comparison of their performance with several baseline systems. The
evaluation includes comparisons of system output with human-produced simplifications,
automated estimations of the readability of system output, and surveys
of human opinions on the grammaticality, accessibility, and meaning of automatically
produced simplifications.
Chapter 7 presents extrinsic evaluation of the sentence simplification method
exploiting handcrafted rule activation patterns. The extrinsic evaluation involves
three NLP tasks: multidocument summarisation, semantic role labelling, and information
extraction. Finally, in Chapter 8, conclusions are drawn and directions
for future research considered
Seeing affect: knowledge infrastructures in facial expression recognition systems
Efforts to process and simulate human affect have come to occupy a prominent role in
Human-Computer Interaction as well as developments in machine learning systems.
Affective computing applications promise to decode human affective experience and
provide objective insights into usersʼ affective behaviors, ranging from frustration and
boredom to states of clinical relevance such as depression and anxiety. While these
projects are often grounded in psychological theories that have been contested both
within scholarly and public domains, practitioners have remained largely agnostic to
this debate, focusing instead on the development of either applicable technical systems
or advancements of the fieldʼs state of the art. I take this controversy as an entry point
to investigate the tensions related to the classification of affective behaviors and how
practitioners validate these classification choices.
This work offers an empirical examination of the discursive and material
repertoires ‒ the infrastructures of knowledge ‒ that affective computing practitioners
mobilize to legitimize and validate their practice. I build on feminist studies of science
and technology to interrogate and challenge the claims of objectivity on which affective
computing applications rest. By looking at research practices and commercial
developments of Facial Expression Recognition (FER) systems, the findings unpack
the interplay of knowledge, vision, and power underpinning the development of
machine learning applications of affective computing.
The thesis begins with an analysis of historical efforts to quantify affective
behaviors and how these are reflected in modern affective computing practice. Here,
three main themes emerge that will guide and orient the empirical findings: 1) the role
that framings of science and scientific practice play in constructing affective behaviors
as “objective” scientific facts, 2) the role of human interpretation and mediation
required to make sense of affective data, and 3) the prescriptive and performative
dimensions of these quantification efforts. This analysis forms the historical backdrop
for the empirical core of the thesis: semi-structured interviews with affective
computing practitioners across the academic and industry sectors, including the data
annotators labelling the modelsʼ training datasets.
My findings reveal the discursive and material strategies that participants adopt
to validate affective classification, including forms of boundary work to establish
credibility as well as the local and contingent work of human interpretation and
standardization involved in the process of making sense of affective data. Here, I show
how, despite their professed agnosticism, practitioners must make normative choices
in order to ʻseeʼ (and teach machines how to see) affect. I apply the notion of knowledge
infrastructures to conceptualize the scaffolding of data practices, norms and routines,
psychological theories, and historical and epistemological assumptions that shape
practitionersʼ vision and inform FER design.
Finally, I return to the problem of agnosticism and its socio-ethical relevance to
the broader field of machine learning. Here, I argue that agnosticism can make it
difficult to locate the technologyʼs historical and epistemological lineages and,
therefore, obscure accountability. I conclude by arguing that both policy and practice
would benefit from a nuanced examination of the plurality of visions and forms of
knowledge involved in the automation of affect
An ontological framework for the formal representation and management of human stress knowledge
There is a great deal of information on the topic of human stress which is embedded within numerous papers across various databases. However, this information is stored, retrieved, and used often discretely and dispersedly. As a result, discovery and identification of the links and interrelatedness between different aspects of knowledge on stress is difficult. This restricts the effective search and retrieval of desired information. There is a need to organize this knowledge under a unifying framework, linking and analysing it in mutual combinations so that we can obtain an inclusive view of the related phenomena and new knowledge can emerge. Furthermore, there is a need to establish evidence-based and evolving relationships between the ontology concepts.Previous efforts to classify and organize stress-related phenomena have not been sufficiently inclusive and none of them has considered the use of ontology as an effective facilitating tool for the abovementioned issues.There have also been some research works on the evolution and refinement of ontology concepts and relationships. However, these fail to provide any proposals for an automatic and systematic methodology with the capacity to establish evidence-based/evolving ontology relationships.In response to these needs, we have developed the Human Stress Ontology (HSO), a formal framework which specifies, organizes, and represents the domain knowledge of human stress. This machine-readable knowledge model is likely to help researchers and clinicians find theoretical relationships between different concepts, resulting in a better understanding of the human stress domain and its related areas. The HSO is formalized using OWL language and Protégé tool.With respect to the evolution and evidentiality of ontology relationships in the HSO and other scientific ontologies, we have proposed the Evidence-Based Evolving Ontology (EBEO), a methodology for the refinement and evolution of ontology relationships based on the evidence gleaned from scientific literature. The EBEO is based on the implementation of a Fuzzy Inference System (FIS).Our evaluation results showed that almost all stress-related concepts of the sample articles can be placed under one or more category of the HSO. Nevertheless, there were a number of limitations in this work which need to be addressed in future undertakings.The developed ontology has the potential to be used for different data integration and interoperation purposes in the domain of human stress. It can also be regarded as a foundation for the future development of semantic search engines in the stress domain
Recommended from our members
An Informatics Roadmap Toward a FAIR Understanding of Mitochondrial Biology and Rare Mitochondrial Disease
Mitochondrial biology is integral to our fundamental understanding of human health and many diseases. They exist in every human cell type except for red blood cells and have critical functions in metabolism, oxidative phosphorylation, oxidation-reduction, and as signaling hubs responsible for mediating protective mechanisms. Rare mitochondrial diseases (RMDs) are devastating and complex, affect multiple organ systems, and disproportionately impact young children. Despite copious existing knowledge and increased public interest, the knowledge is fragmented and difficult to access. Clinical case reports (CCRs) on RMDs contain valuable clinical insights, but they are scarce and lack the metadata necessary to facilitate their discovery among the two million CCRs on PubMed. The unstructured text data of CCRs is also ill-suited to computational approaches, limiting our ability to derive the knowledge contained within.To address these issues, I assembled all available informatics tools and resources with mitochondrial components and used them to contribute to Gene Wiki pages that enable easy access to mitochondrial knowledge for researchers, students, clinicians, and patients. Through these efforts, I made mitochondrial gene, protein, and disease knowledge widely accessible with contributions of over 4MB of content across 541 Gene Wiki pages. Concurrently, I used Gene Wiki as an educational platform to train over 50 students in the biosciences and pre-medical studies in mitochondrial biology and disease, as well as instilling effective research and writing methods in biomedicine.To impose structure on CCRs and render them FAIR (Findable, Accessible, Interoperable, Reusable), I developed and applied a standardized metadata template to RMD CCRs and codified patient symptomology with the International Statistical Classification of Disease and Related Health Problems (ICD) system. I created the open-source, cloud-based MitoCases RMD Knowledge Platform (http://mitocases.org/) to house data on 384 RMD CCRs, including 4,561 instances of 952 unique ICD codes. Supplementing CCRs with structured metadata amplifies machine-readable information content and provides a distinct improvement in searching for CCRs as compared to indexing by title and abstract. Finally, I employed these resources to conduct a thorough review of Barth syndrome and characterized the diversity of presentations, range of genetic etiologies, and treatment paradigms
- …