22 research outputs found

    Building Quranic stories ontology using MappingMaster domain-specific language

    Get PDF
    The Holy Quran, due to it is full of many inspiring stories and multiple lessons that need to understand it requires additional attention when it comes to searching issues and information retrieval. Many works were carried out in the Holy Quran field, but some of these dealt with a part of the Quran or covered it in general, and some of them did not support semantic research techniques and the possibility of understanding the Quranic knowledge by the people and computers. As for others, techniques of data analysis, processing, and ontology were adopted, which led to directed these to linguistic aspects more than semantic. Another weakness in the previous works, they have adopted the method manually entering ontology, which is costly and time-consuming. In this paper, we constructed the ontology of Quranic stories. This ontology depended in its construction on the MappingMaster domain-specific language (MappingMaster DSL)technology, through which concepts and individuals can be created and linked automatically to the ontology from Excel sheets. The conceptual structure was built using the object role modeling (ORM) modeling language. SPARQL query language used to test and evaluate the propsed ontology by asking many competency questions and as a result, the ontology answered all these questions well

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

    NOBLE - Flexible concept recognition for large-scale biomedical natural language processing

    Get PDF
    Background: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines

    Extending ontologies by finding siblings using set expansion techniques

    Get PDF
    Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level

    Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn

    Get PDF
    AbstractBio-ontology development is a resource-consuming task despite the many open source ontologies available for reuse. Various strategies and tools for bottom-up ontology development have been proposed from a computing angle, yet the most obvious one from a domain expert perspective is unexplored: the abundant diagrams in the sciences. To speed up and simplify bio-ontology development, we propose a detailed, micro-level, procedure, DiDOn, to formalise such semi-structured biological diagrams availing also of a foundational ontology for more precise and interoperable subject domain semantics. The approach is illustrated using Pathway Studio as case study

    Extracting clinical knowledge from electronic medical records

    Get PDF
    As the adoption of Electronic Medical Records (EMRs) rises in the healthcare institutions, these resources’ importance increases due to all clinical information they contain about patients. However, the unstructured information in the form of clinical narratives present in these records makes it hard to extract and structure useful clinical knowledge. This unstructured information limits the potential of the EMRs because the clinical information these records contain can be used to perform essential tasks inside healthcare institutions such as searching, summarization, decision support and statistical analysis, as well as be used to support management decisions or serve for research. These tasks can only be done if the unstructured clinical information from the narratives is appropriately extracted, structured and processed in clinical knowledge. Usually, this information extraction and structuration in clinical knowledge is performed manually by healthcare practitioners, which is not efficient and is error-prone. This research aims to propose a solution to this problem, by using Machine Translation (MT) from the Portuguese language to the English language, Natural Language Processing (NLP) and Information Extraction (IE) techniques. With the help of these techniques, the goal is to develop a prototype pipeline modular system that can extract clinical knowledge from unstructured clinical information contained in Portuguese EMRs, in an automated way, in order to help EMRs to fulfil their potential and consequently help the Portuguese hospital involved in this research. This research also intends to show that this generic prototype system and approach can potentially be applied to other hospitals, even if they don’t use the Portuguese language.Com a adopção cada vez maior das instituições de saúde face aos Processos Clínicos Electrónicos (PCE), estes documentos ganham cada vez mais importância em contexto clínico, devido a toda a informação clínica que contêm relativamente aos pacientes. No entanto, a informação não estruturada na forma de narrativas clínicas presente nestes documentos electrónicos, faz com que seja difícil extrair e estruturar deles conhecimento clínico. Esta informação não estruturada limita o potencial dos PCE, uma vez que essa mesma informação, caso seja extraída e estruturada devidamente, pode servir para que as instituições de saúde possam efectuar actividades importantes com maior eficiência e sucesso, como por exemplo actividades de pesquisa, sumarização, apoio à decisão, análises estatísticas, suporte a decisões de gestão e de investigação. Este tipo de actividades apenas podem ser feitas com sucesso caso a informação clínica não estruturada presente nos PCE seja devidamente extraída, estruturada e processada em conhecimento clínico. Habitualmente, esta extração é realizada manualmente pelos profissionais médicos, o que não é eficiente e é susceptível a erros. Esta dissertação pretende então propôr uma solução para este problema, ao utilizar técnicas de Tradução Automática (TA) da língua portuguesa para a língua inglesa, Processamento de Linguagem Natural (PLN) e Extração de Informação (EI). O objectivo é desenvolver um sistema protótipo de módulos em série que utilize estas técnicas, possibilitando a extração de conhecimento clínico, de uma forma automática, de informação clínica não estruturada presente nos PCE de um hospital português. O principal objetivo é ajudar os PCE a atingirem todo o seu potencial em termos de conhecimento clínico que contêm e consequentemente ajudar o hospital português em questão envolvido nesta dissertação, demonstrando também que este sistema protótipo e esta abordagem podem potencialmente ser aplicados a outros hospitais, mesmo que não sejam de língua portuguesa

    Automated machine learning for healthcare and clinical notes analysis

    Get PDF
    Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes
    corecore