125 research outputs found

    An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed

    Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems

    Get PDF
    This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 P

    Advanced machine learning methods for oncological image analysis

    Get PDF
    Cancer is a major public health problem, accounting for an estimated 10 million deaths worldwide in 2020 alone. Rapid advances in the field of image acquisition and hardware development over the past three decades have resulted in the development of modern medical imaging modalities that can capture high-resolution anatomical, physiological, functional, and metabolic quantitative information from cancerous organs. Therefore, the applications of medical imaging have become increasingly crucial in the clinical routines of oncology, providing screening, diagnosis, treatment monitoring, and non/minimally- invasive evaluation of disease prognosis. The essential need for medical images, however, has resulted in the acquisition of a tremendous number of imaging scans. Considering the growing role of medical imaging data on one side and the challenges of manually examining such an abundance of data on the other side, the development of computerized tools to automatically or semi-automatically examine the image data has attracted considerable interest. Hence, a variety of machine learning tools have been developed for oncological image analysis, aiming to assist clinicians with repetitive tasks in their workflow. This thesis aims to contribute to the field of oncological image analysis by proposing new ways of quantifying tumor characteristics from medical image data. Specifically, this thesis consists of six studies, the first two of which focus on introducing novel methods for tumor segmentation. The last four studies aim to develop quantitative imaging biomarkers for cancer diagnosis and prognosis. The main objective of Study I is to develop a deep learning pipeline capable of capturing the appearance of lung pathologies, including lung tumors, and integrating this pipeline into the segmentation networks to leverage the segmentation accuracy. The proposed pipeline was tested on several comprehensive datasets, and the numerical quantifications show the superiority of the proposed prior-aware DL framework compared to the state of the art. Study II aims to address a crucial challenge faced by supervised segmentation models: dependency on the large-scale labeled dataset. In this study, an unsupervised segmentation approach is proposed based on the concept of image inpainting to segment lung and head- neck tumors in images from single and multiple modalities. The proposed autoinpainting pipeline shows great potential in synthesizing high-quality tumor-free images and outperforms a family of well-established unsupervised models in terms of segmentation accuracy. Studies III and IV aim to automatically discriminate the benign from the malignant pulmonary nodules by analyzing the low-dose computed tomography (LDCT) scans. In Study III, a dual-pathway deep classification framework is proposed to simultaneously take into account the local intra-nodule heterogeneities and the global contextual information. Study IV seeks to compare the discriminative power of a series of carefully selected conventional radiomics methods, end-to-end Deep Learning (DL) models, and deep features-based radiomics analysis on the same dataset. The numerical analyses show the potential of fusing the learned deep features into radiomic features for boosting the classification power. Study V focuses on the early assessment of lung tumor response to the applied treatments by proposing a novel feature set that can be interpreted physiologically. This feature set was employed to quantify the changes in the tumor characteristics from longitudinal PET-CT scans in order to predict the overall survival status of the patients two years after the last session of treatments. The discriminative power of the introduced imaging biomarkers was compared against the conventional radiomics, and the quantitative evaluations verified the superiority of the proposed feature set. Whereas Study V focuses on a binary survival prediction task, Study VI addresses the prediction of survival rate in patients diagnosed with lung and head-neck cancer by investigating the potential of spherical convolutional neural networks and comparing their performance against other types of features, including radiomics. While comparable results were achieved in intra- dataset analyses, the proposed spherical-based features show more predictive power in inter-dataset analyses. In summary, the six studies incorporate different imaging modalities and a wide range of image processing and machine-learning techniques in the methods developed for the quantitative assessment of tumor characteristics and contribute to the essential procedures of cancer diagnosis and prognosis

    Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease

    Get PDF
    Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical. This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest. The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge

    Sentence Simplification for Text Processing

    Get PDF
    A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which affect the ability of humans and machines to process them effectively. In this thesis, I present a new approach to automatic sentence simplification which processes sentences containing compound clauses and complex noun phrases (NPs) and converts them into sequences of simple sentences which contain fewer of these constituents and have reduced per sentence propositional density and syntactic complexity. My overall approach is iterative and relies on both machine learning and handcrafted rules. It implements a small set of sentence transformation schemes, each of which takes one sentence containing compound clauses or complex NPs and converts it one or two simplified sentences containing fewer of these constituents (Chapter 5). The iterative algorithm applies the schemes repeatedly and is able to simplify sentences which contain arbitrary numbers of compound clauses and complex NPs. The transformation schemes rely on automatic detection of these constituents, which may take a variety of forms in input sentences. In the thesis, I present two new shallow syntactic analysis methods which facilitate the detection process. The first of these identifies various explicit signs of syntactic complexity in input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and evaluate this sign tagger (Chapter 2) and the machine learning method used to implement it (Chapter 3). The second syntactic analysis method exploits the sign tagger and identifies the spans of compound clauses and complex NPs in input sentences. In Chapter 4 of the thesis, I describe the development and evaluation of a machine learning approach performing this task. This chapter also presents a new annotated dataset supporting this activity. In the thesis, I present two implementations of my approach to sentence simplification. One of these exploits handcrafted rule activation patterns to detect different parts of input sentences which are relevant to the simplification process. The other implementation uses my machine learning method to identify compound clauses and complex NPs for this purpose. Intrinsic evaluation of the two implementations is presented in Chapter 6 together with a comparison of their performance with several baseline systems. The evaluation includes comparisons of system output with human-produced simplifications, automated estimations of the readability of system output, and surveys of human opinions on the grammaticality, accessibility, and meaning of automatically produced simplifications. Chapter 7 presents extrinsic evaluation of the sentence simplification method exploiting handcrafted rule activation patterns. The extrinsic evaluation involves three NLP tasks: multidocument summarisation, semantic role labelling, and information extraction. Finally, in Chapter 8, conclusions are drawn and directions for future research considered

    Seeing affect: knowledge infrastructures in facial expression recognition systems

    Get PDF
    Efforts to process and simulate human affect have come to occupy a prominent role in Human-Computer Interaction as well as developments in machine learning systems. Affective computing applications promise to decode human affective experience and provide objective insights into usersʼ affective behaviors, ranging from frustration and boredom to states of clinical relevance such as depression and anxiety. While these projects are often grounded in psychological theories that have been contested both within scholarly and public domains, practitioners have remained largely agnostic to this debate, focusing instead on the development of either applicable technical systems or advancements of the fieldʼs state of the art. I take this controversy as an entry point to investigate the tensions related to the classification of affective behaviors and how practitioners validate these classification choices. This work offers an empirical examination of the discursive and material repertoires ‒ the infrastructures of knowledge ‒ that affective computing practitioners mobilize to legitimize and validate their practice. I build on feminist studies of science and technology to interrogate and challenge the claims of objectivity on which affective computing applications rest. By looking at research practices and commercial developments of Facial Expression Recognition (FER) systems, the findings unpack the interplay of knowledge, vision, and power underpinning the development of machine learning applications of affective computing. The thesis begins with an analysis of historical efforts to quantify affective behaviors and how these are reflected in modern affective computing practice. Here, three main themes emerge that will guide and orient the empirical findings: 1) the role that framings of science and scientific practice play in constructing affective behaviors as “objective” scientific facts, 2) the role of human interpretation and mediation required to make sense of affective data, and 3) the prescriptive and performative dimensions of these quantification efforts. This analysis forms the historical backdrop for the empirical core of the thesis: semi-structured interviews with affective computing practitioners across the academic and industry sectors, including the data annotators labelling the modelsʼ training datasets. My findings reveal the discursive and material strategies that participants adopt to validate affective classification, including forms of boundary work to establish credibility as well as the local and contingent work of human interpretation and standardization involved in the process of making sense of affective data. Here, I show how, despite their professed agnosticism, practitioners must make normative choices in order to ʻseeʼ (and teach machines how to see) affect. I apply the notion of knowledge infrastructures to conceptualize the scaffolding of data practices, norms and routines, psychological theories, and historical and epistemological assumptions that shape practitionersʼ vision and inform FER design. Finally, I return to the problem of agnosticism and its socio-ethical relevance to the broader field of machine learning. Here, I argue that agnosticism can make it difficult to locate the technologyʼs historical and epistemological lineages and, therefore, obscure accountability. I conclude by arguing that both policy and practice would benefit from a nuanced examination of the plurality of visions and forms of knowledge involved in the automation of affect

    An ontological framework for the formal representation and management of human stress knowledge

    Get PDF
    There is a great deal of information on the topic of human stress which is embedded within numerous papers across various databases. However, this information is stored, retrieved, and used often discretely and dispersedly. As a result, discovery and identification of the links and interrelatedness between different aspects of knowledge on stress is difficult. This restricts the effective search and retrieval of desired information. There is a need to organize this knowledge under a unifying framework, linking and analysing it in mutual combinations so that we can obtain an inclusive view of the related phenomena and new knowledge can emerge. Furthermore, there is a need to establish evidence-based and evolving relationships between the ontology concepts.Previous efforts to classify and organize stress-related phenomena have not been sufficiently inclusive and none of them has considered the use of ontology as an effective facilitating tool for the abovementioned issues.There have also been some research works on the evolution and refinement of ontology concepts and relationships. However, these fail to provide any proposals for an automatic and systematic methodology with the capacity to establish evidence-based/evolving ontology relationships.In response to these needs, we have developed the Human Stress Ontology (HSO), a formal framework which specifies, organizes, and represents the domain knowledge of human stress. This machine-readable knowledge model is likely to help researchers and clinicians find theoretical relationships between different concepts, resulting in a better understanding of the human stress domain and its related areas. The HSO is formalized using OWL language and Protégé tool.With respect to the evolution and evidentiality of ontology relationships in the HSO and other scientific ontologies, we have proposed the Evidence-Based Evolving Ontology (EBEO), a methodology for the refinement and evolution of ontology relationships based on the evidence gleaned from scientific literature. The EBEO is based on the implementation of a Fuzzy Inference System (FIS).Our evaluation results showed that almost all stress-related concepts of the sample articles can be placed under one or more category of the HSO. Nevertheless, there were a number of limitations in this work which need to be addressed in future undertakings.The developed ontology has the potential to be used for different data integration and interoperation purposes in the domain of human stress. It can also be regarded as a foundation for the future development of semantic search engines in the stress domain
    corecore