33 research outputs found

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

    Detection of Intestinal Bleeding in Wireless Capsule Endoscopy using Machine Learning Techniques

    Get PDF
    Gastrointestinal (GI) bleeding is very common in humans, which may lead to fatal consequences. GI bleeding can usually be identified using a flexible wired endoscope. In 2001, a newer diagnostic tool, wireless capsule endoscopy (WCE) was introduced. It is a swallow-able capsule-shaped device with a camera that captures thousands of color images and wirelessly sends those back to a data recorder. After that, the physicians analyze those images in order to identify any GI abnormalities. But it takes a longer screening time which may increase the danger of the patients in emergency cases. It is therefore necessary to use a real-time detection tool to identify bleeding in the GI tract. Each material has its own spectral ‘signature’ which shows distinct characteristics in specific wavelength of light [33]. Therefore, by evaluating the optical characteristics, the presence of blood can be detected. In the study, three main hardware designs were presented: one using a two-wavelength based optical sensor and others using two six-wavelength based spectral sensors with AS7262 and AS7263 chips respectively to determine the optical characteristics of the blood and non-blood samples. The goal of the research is to develop a machine learning model to differentiate blood samples (BS) and non-blood samples (NBS) by exploring their optical properties. In this experiment, 10 levels of crystallized bovine hemoglobin solutions were used as BS and 5 food colors (red, yellow, orange, tan and pink) with different concentrations totaling 25 non-blood samples were used as NBS. These blood and non-blood samples were also combined with pig’s intestine to mimic in-vivo experimental environment. The collected samples were completely separated into training and testing data. Different spectral features are analyzed to obtain the optical information about the samples. Based on the performance on the selected most significant features of the spectral wavelengths, k-nearest neighbors algorithm (k-NN) is finally chosen for the automated bleeding detection. The proposed k-NN classifier model has been able to distinguish the BS and NBS with an accuracy of 91.54% using two wavelengths features and around 89% using three combined wavelengths features in the visible and near-infrared spectral regions. The research also indicates that it is possible to deploy tiny optical detectors to detect GI bleeding in a WCE system which could eliminate the need of time-consuming image post-processing steps

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively
    corecore