10 research outputs found

    Design and evaluation of an ontology based information extraction system for radiological reports

    Get PDF
    Cataloged from PDF version of article.This paper describes an information extraction system that extracts and converts the available information in free text Turkish radiology reports into a structured information model using manually created extraction rules and domain ontology. The ontology provides flexibility in the design of extraction rules, and determines the information model for the extracted semantic information. Although our information extraction system mainly concentrates on abdominal radiology reports, the system can be used in another field of medicine by adapting its ontology and extraction rule set. We achieved very high precision and recall results during the evaluation of the developed system with unseen radiology reports. (C) 2010 Elsevier Ltd. All rights reserved

    Design and evaluation of an ontology based information extraction system for radiological reports

    Get PDF
    This paper describes an information extraction system that extracts and converts the available information in free text Turkish radiology reports into a structured information model using manually created extraction rules and domain ontology. The ontology provides flexibility in the design of extraction rules, and determines the information model for the extracted semantic information. Although our information extraction system mainly concentrates on abdominal radiology reports, the system can be used in another field of medicine by adapting its ontology and extraction rule set. We achieved very high precision and recall results during the evaluation of the developed system with unseen radiology reports. © 2010 Elsevier Ltd

    Improving knowledge management through the support of image examination and data annotation using DICOM structured reporting

    Get PDF
    [EN] An important effort has been invested on improving the image diagnosis process in different medical areas using information technologies. The field of medical imaging involves two main data types: medical imaging and reports. Developments based on the DICOM standard have demonstrated to be a convenient and widespread solution among the medical community. The main objective of this work is to design a Web application prototype that will be able to improve diagnosis and follow-on of breast cancer patients. It is based on TRENCADIS middleware, which provides a knowledge-oriented storage model composed by federated repositories of DICOM image studies and DICOM-SR medical reports. The full structure and contents of the diagnosis reports are used as metadata for indexing images. The TRENCADIS infrastructure takes full advantage of Grid technologies by deploying multi-resource grid services that enable multiple views (reports schemes) of the knowledge database. The paper presents a real deployment of such Web application prototype in the Dr. Peset Hospital providing radiologists with a tool to create, store and search diagnostic reports based on breast cancer explorations (mammography, magnetic resonance, ultrasound, pre-surgery biopsy and post-surgery biopsy), improving support for diagnostics decisions. A technical details for use cases (outlining enhanced multi-resource grid services communication and processing steps) and interactions between actors and the deployed prototype are described. As a result, information is more structured, the logic is clearer, network messages have been reduced and, in general, the system is more resistant to failures.The authors wish to thank the financial support received from The Spanish Ministry of Education and Science to develop the project "CodeCloud", with reference TIN2010-17804.Salavert Torres, J.; Segrelles Quilis, JD.; Blanquer Espert, I.; Hernández García, V. (2012). Improving knowledge management through the support of image examination and data annotation using DICOM structured reporting. Journal of Biomedical Informatics. 45(6):1066-1074. https://doi.org/10.1016/j.jbi.2012.07.004S1066107445

    Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

    Get PDF
    Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline. Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources. Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%. Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain

    Ontology evaluation approach for semantic web documents

    Get PDF
    Ontology is a conceptual tool used for managing and capturing information related to domain knowledge, such as the travel, education and medical domains. Publicly available ontology repositories like Falcons and SWOOGLE enhance the growth of ontology on the Web by providing a medium for ontology developers to publish their ontologies. In order to promote ontology reuse, a suitable approach for ontology evaluation is required that deals with ontology coverage for domain representation which includes an approach for validating the ontology with a corpus of information containing terms related to domain knowledge. Since contributions in ontology evaluation were introduced in different aspects, it is important to conceptualise related information to build an evaluation approach that can help users to select ontology. This work proposed OntoUji, an ontology that conceptualises information related to ontology evaluation. From OntoUji conceptualisation, these works proceed with the development of evaluation steps that are then converted into ontology evaluation algorithms to evaluate ontology documents retrieved from selected repositories according to data-driven evaluation approach. The data-driven approach focuses on evaluating the coverage of ontology using a set of keywords provided, yet similarly involves a comparison of ontological vocabulary with a pre-defined corpus, WordNet, gained from the information retrieval approach. The evaluation is then processed using Letters Pair Similarity algorithm as the selected similarity measures technique to process the ontology coverage result. The findings showed that the OntoUji ontology conceptualization helps to define ontology evaluation steps to gain similarity result for ontology selection

    MODELO DE TEXT CLUSTERING PARA EL DESCUBRIMIENTO DE PATRONES EN TEXTOS TÉCNICOS CORTOS NO ESTRUCTURADOS

    Get PDF
    IR (INFORMATION RETRIEVAL – RECUPERACIÓN DE INFORMACIÓN) INDEXING (INDEXACIÓN) PRE-PROCESAMIENTO CONVERTIR FORMATO RECONOCIMIENTO DE LA ESTRUCTURA CORRECTOR DE ORTOGRAFÍA ANÁLISIS LÉXICO NEGATIVE DICTIONARY STEMMING THESAURUS (ESPACIO CARACTERÍSTICO) GENERACIÓN DE VECTORES CARACTERÍSTICOS TEXT CLUSTERING (TC) WORDNET TERM SELECTION CLUSTERING DISTANCIAS EVALUACIÓN PROPUESTA DE MODELO DE TEXT CLUSTERING PARA TEXTOS TÉCNICOS CORTOS NO ESTRUCTURADOS DEFINICIÓN DE THESAURUS TÉCNICA DE RD PROPUESTA: N-GRAMAS HÍBRIDO APLICACIÓN DEL MODELO DE TEXT CLUSTERING DEFINICIÓN DE THESAURUS (WORDNET) PRUEBAS Y RESULTADOS DEFINICIÓN DE THESAURUS (WORDNET) NPL (NATURAL LANGUAGE PROCESSING – PROCESAMIENTO DE LENGUAJE NATURAL) MODELOS DE IR IE (INFORMATION EXTRACTION – EXTRACCIÓN DE INFORMACIÓN

    Development and Evaluation of an Ontology-Based Quality Metrics Extraction System

    Get PDF
    The Institute of Medicine reports a growing demand in recent years for quality improvement within the healthcare industry. In response, numerous organizations have been involved in the development and reporting of quality measurement metrics. However, disparate data models from such organizations shift the burden of accurate and reliable metrics extraction and reporting to healthcare providers. Furthermore, manual abstraction of quality metrics and diverse implementation of Electronic Health Record (EHR) systems deepens the complexity of consistent, valid, explicit, and comparable quality measurement reporting within healthcare provider organizations. The main objective of this research is to evaluate an ontology-based information extraction framework to utilize unstructured clinical text for defining and reporting quality of care metrics that are interpretable and comparable across different healthcare institutions. All clinical transcribed notes (48,835) from 2,085 patients who had undergone surgery in 2011 at MD Anderson Cancer Center were extracted from their EMR system and pre- processed for identification of section headers. Subsequently, all notes were analyzed by MetaMap v2012 and one XML file was generated per each note. XML outputs were converted into Resource Description Framework (RDF) format. We also developed three ontologies: section header ontology from extracted section headers using RDF standard, concept ontology comprising entities representing five quality metrics from SNOMED (Diabetes, Hypertension, Cardiac Surgery, Transient Ischemic Attack, CNS tumor), and a clinical note ontology that represented clinical note elements and their relationships. All ontologies (Web Ontology Language format) and patient notes (RDFs) were imported into a triple store (AllegroGraph?) as classes and instances respectively. SPARQL information retrieval protocol was used for reporting extracted concepts under four settings: base Natural Language Processing (NLP) output, inclusion of concept ontology, exclusion of negated concepts, and inclusion of section header ontology. Existing manual abstraction data from surgical clinical reviewers, on the same set of patients and documents, was considered as the gold standard. Micro-average results of statistical agreement tests on the base NLP output showed an increase from 59%, 81%, and 68% to 74%, 91%, and 82% (Precision, Recall, F-Measure) respectively after incremental addition of ontology layers. Our study introduced a framework that may contribute to advances in “complementary” components for the existing information extraction systems. The application of an ontology-based approach for natural language processing in our study has provided mechanisms for increasing the performance of such tools. The pivot point for extracting more meaningful quality metrics from clinical narratives is the abstraction of contextual semantics hidden in the notes. We have defined some of these semantics and quantified them in multiple complementary layers in order to demonstrate the importance and applicability of an ontology-based approach in quality metric extraction. The application of such ontology layers introduces powerful new ways of querying context dependent entities from clinical texts. Rigorous evaluation is still necessary to ensure the quality of these “complementary” NLP systems. Moreover, research is needed for creating and updating evaluation guidelines and criteria for assessment of performance and efficiency of ontology-based information extraction in healthcare and to provide a consistent baseline for the purpose of comparing alternative approaches

    An ontology-based approach toward the configuration of heterogeneous network devices

    Get PDF
    Despite the numerous efforts of standardization, semantic issues remain in effect in many subfields of networking. The inability to exchange data unambiguously between information systems and human resources is an issue that hinders technology implementation, semantic interoperability, service deployment, network management, technology migration, among many others. In this thesis, we will approach the semantic issues in two critical subfields of networking, namely, network configuration management and network addressing architectures. The fact that makes the study in these areas rather appealing is that in both scenarios semantic issues have been around from the very early days of networking. However, as networks continue to grow in size and complexity current practices are becoming neither scalable nor practical. One of the most complex and essential tasks in network management is the configuration of network devices. The lack of comprehensive and standard means for modifying and controlling the configuration of network elements has led to the continuous and extended use of proprietary Command Line Interfaces (CLIs). Unfortunately, CLIs are generally both, device and vendor-specific. In the context of heterogeneous network infrastructures---i.e., networks typically composed of multiple devices from different vendors---the use of several CLIs raises serious Operation, Administration and Management (OAM) issues. Accordingly, network administrators are forced to gain specialized expertise and to continuously keep knowledge and skills up to date as new features, system upgrades or technologies appear. Overall, the utilization of proprietary mechanisms allows neither sharing knowledge consistently between vendors' domains nor reusing configurations to achieve full automation of network configuration tasks---which are typically required in autonomic management. Due to this heterogeneity, CLIs typically provide a help feature which is in turn an useful source of knowledge to enable semantic interpretation of a vendor's configuration space. The large amount of information a network administrator must learn and manage makes Information Extraction (IE) and other forms of natural language analysis of the Artificial Intelligence (AI) field key enablers for the network device configuration space. This thesis presents the design and implementation specification of the first Ontology-Based Information Extraction (OBIE) System from the CLI of network devices for the automation and abstraction of device configurations. Moreover, the so-called semantic overload of IP addresses---wherein addresses are both identifiers and locators of a node at the same time---is one of the main constraints over mobility of network hosts, multi-homing and scalability of the routing system. In light of this, numerous approaches have emerged in an effort to decouple the semantics of the network addressing scheme. In this thesis, we approach this issue from two perspectives, namely, a non-disruptive (i.e., evolutionary) solution to the current Internet and a clean-slate approach for Future Internet. In the first scenario, we analyze the Locator/Identifier Separation Protocol (LISP) as it is currently one of the strongest solutions to the semantic overload issue. However, its adoption is hindered by existing problems in the proposed mapping systems. Herein, we propose the LISP Redundancy Protocol (LRP) aimed to complement the LISP framework and strengthen feasibility of deployment, while at the same time, minimize mapping table size, latency time and maximize reachability in the network. In the second scenario, we explore TARIFA a Next Generation Internet architecture and introduce a novel service-centric addressing scheme which aims to overcome the issues related to routing and semantic overload of IP addresses.A pesar de los numerosos esfuerzos de estandarización, los problemas de semántica continúan en efecto en muchas subáreas de networking. La inabilidad de intercambiar data sin ambiguedad entre sistemas es un problema que limita la interoperabilidad semántica. En esta tesis, abordamos los problemas de semántica en dos áreas: (i) la gestión de configuración y (ii) arquitecturas de direccionamiento. El hecho que hace el estudio en estas áreas de interés, es que los problemas de semántica datan desde los inicios del Internet. Sin embargo, mientras las redes continúan creciendo en tamaño y complejidad, los mecanismos desplegados dejan de ser escalabales y prácticos. Una de las tareas más complejas y esenciales en la gestión de redes es la configuración de equipos. La falta de mecanismos estándar para la modificación y control de la configuración de equipos ha llevado al uso continuado y extendido de interfaces por líneas de comando (CLI). Desafortunadamente, las CLIs son generalmente, específicos por fabricante y dispositivo. En el contexto de redes heterogéneas--es decir, redes típicamente compuestas por múltiples dispositivos de distintos fabricantes--el uso de varias CLIs trae consigo serios problemas de operación, administración y gestión. En consecuencia, los administradores de red se ven forzados a adquirir experiencia en el manejo específico de múltiples tecnologías y además, a mantenerse continuamente actualizados en la medida en que nuevas funcionalidades o tecnologías emergen, o bien con actualizaciones de sistemas operativos. En general, la utilización de mecanismos propietarios no permite compartir conocimientos de forma consistente a lo largo de plataformas heterogéneas, ni reutilizar configuraciones con el objetivo de alcanzar la completa automatización de tareas de configuración--que son típicamente requeridas en el área de gestión autonómica. Debido a esta heterogeneidad, las CLIs suelen proporcionar una función de ayuda que fundamentalmente aporta información para la interpretación semántica del entorno de configuración de un fabricante. La gran cantidad de información que un administrador debe aprender y manejar, hace de la extracción de información y otras formas de análisis de lenguaje natural del campo de Inteligencia Artificial, potenciales herramientas para la configuración de equipos en entornos heterogéneos. Esta tesis presenta el diseño y especificaciones de implementación del primer sistema de extracción de información basada en ontologías desde el CLI de dispositivos de red, para la automatización y abstracción de configuraciones. Por otra parte, la denominada sobrecarga semántica de direcciones IP--en donde, las direcciones son identificadores y localizadores al mismo tiempo--es una de las principales limitaciones sobre mobilidad, multi-homing y escalabilidad del sistema de enrutamiento. Por esta razón, numerosas propuestas han emergido en un esfuerzo por desacoplar la semántica del esquema de direccionamiento de las redes actuales. En esta tesis, abordamos este problema desde dos perspectivas, la primera de ellas una aproximación no-disruptiva (es decir, evolucionaria) al problema del Internet actual y la segunda, una nueva propuesta en torno a futuras arquitecturas del Internet. En el primer escenario, analizamos el protocolo LISP (del inglés, Locator/Identifier Separation Protocol) ya que es en efecto, una de las soluciones con mayor potencial para la resolucion del problema de semántica. Sin embargo, su adopción está limitada por problemas en los sistemas de mapeo propuestos. En esta tesis, proponemos LRP (del inglés, LISP Redundancy Protocol) un protocolo destinado a complementar LISP e incrementar la factibilidad de despliegue, a la vez que, reduce el tamaño de las tablas de mapeo, tiempo de latencia y maximiza accesibilidad. En el segundo escenario, exploramos TARIFA una arquitectura de red de nueva generación e introducimos un novedoso esquema de direccionamiento orientado a servicios

    Automated energy compliance checking in construction

    Get PDF
    Automated energy compliance checking aims to automatically check the compliance of a building design – in a building information model (BIM) – with applicable energy requirements. A significant number of efforts in both industry and academia have been undertaken to automate the compliance checking process. Such efforts have achieved various levels of automation, expressivity, representativeness, accuracy, and efficiency. Despite the contributions of these efforts, there are two main gaps in existing automated compliance checking (ACC) efforts. First, existing methods are not fully-automated and/or not generalizable across different types of documents. They require different degrees of manual efforts to extract requirements from text into computer-processable representations, and matching the concept representations of the extracted requirements to those of the BIM. Second, existing methods only focused on code checking. There is still a lack of efforts that address contract specification checking. To address these gaps, this thesis aims to develop a fully-automated ACC method for checking BIM-represented building designs for compliance with energy codes and contract specifications. The research included six primary research tasks: (1) conducting a comprehensive literature review; (2) developing a semantic, domain-specific, machine learning-based text classification method and algorithm for classifying energy regulatory documents (including energy codes) and contract specifications for supporting energy ACC in construction; (3) developing a semantic, natural language processing (NLP)-enabled, rule-based information extraction method and algorithm for automated extraction of energy requirements from energy codes; (4) adapting the information extraction method and algorithm for automated extraction of energy requirements from contract specifications; (5) developing a fully-automated, semantic information alignment method and algorithm for aligning the representations used in the BIMs to the representations used in the energy codes and contract specifications; and (6) implementing the aforementioned methods and algorithms in a fully-automated energy compliance checking prototype, called EnergyACC, and using it in conducting a case study to identify the feasibility and challenges for developing an ACC method that is fully-automated and generalized across different types of regulatory documents. Promising noncompliance detection performance was achieved for both energy code checking (95.7% recall and 85.9% precision) and contract specification checking (100% recall and 86.5% precision)

    Automated code compliance checking in the construction domain using semantic natural language processing and logic-based reasoning

    Get PDF
    Construction projects must comply with various regulations. The manual process of checking the compliance with regulations is costly, time consuming, and error prone. With the advancement in computing technology, there have been many research efforts in automating the compliance checking process, and many software development efforts led by industry bodies/associations, software companies, and/or government organizations to develop automated compliance checking (ACC) systems. However, two main gaps in the existing ACC efforts are: (1) manual effort is needed for extracting requirements from regulatory documents and encoding these requirements in a computer-processable rule format; and (2) there is a lack of a semantic representation for supporting automated compliance reasoning that is non-proprietary, non-hidden, and user-understandable and testable. To address these gaps, this thesis proposes a new ACC method that: (1) utilizes semantic natural language processing (NLP) techniques to automatically extract regulatory information from building codes and design information from building information models (BIMs); and (2) utilizes a semantic logic-based representation to represent and reason about the extracted regulatory information and design information for compliance checking. The proposed method is composed of four main methods/algorithms that are combined in one computational framework: (1) a semantic, rule-based method and algorithm that leverage NLP techniques to automatically extract regulatory information from building codes and represent the extracted information into semantic tuples, (2) a semantic, rule-based method and algorithm that leverage NLP techniques to automatically transform the extracted regulatory information into logic rules to prepare for automated reasoning, (3) a semantic, rule-based information extraction and information transformation method and algorithm to automatically extract design information from BIMs and transform the extracted information into logic facts to prepare for automated reasoning, and (4) a logic-based information representation and compliance reasoning schema to represent regulatory and design information for enabling the automated compliance reasoning process. To test the proposed method, a building information model test case was developed based on the Duplex Apartment Project from buildingSMARTalliance of the National Institute of Building Sciences. The test case was checked for compliance with a randomly selected chapter, Chapter 19, of the International Building Code 2009. Comparing to a manually developed gold standard, 87.6% precision and 98.7% recall in noncompliance detection were achieved, on the testing data
    corecore