5 research outputs found
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation
Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline.
Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources.
Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%.
Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain
An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics
Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies
Recommended from our members
Ontology driven clinical decision support for early diagnostic recommendations
Diagnostic error is a significant problem in medicine and a major cause of concern for patients and clinicians and is associated with moderate to severe harm to patients. Diagnostic errors are a primary cause of clinical negligence and can result in malpractice claims. Cognitive errors caused by biases such as premature closure and confirmation bias have been identified as major cause of diagnostic error. Researchers have identified several strategies to reduce diagnostic error arising from cognitive factors. This includes considering alternatives, reducing reliance on memory, providing access to clear and well-organized information. Clinical Decision Support Systems (CDSSs) have been shown to reduce diagnostic errors.
Clinical guidelines improve consistency of care and can potentially improve healthcare efficiency. They can alert clinicians to diagnostic tests and procedures that have the greatest evidence and provide the greatest benefit. Clinical guidelines can be used to streamline clinical decision making and provide the knowledge base for guideline based CDSSs and clinical alert systems. Clinical guidelines can potentially improve diagnostic decision making by improving information gathering.
Argumentation is an emerging area for dealing with unstructured evidence in domains such as healthcare that are characterized by uncertainty. The knowledge needed to support decision making is expressed in the form of arguments. Argumentation has certain advantages over other decision support reasoning methods. This includes the ability to function with incomplete information, the ability to capture domain knowledge in an easy manner, using non-monotonic logic to support defeasible reasoning and providing recommendations in a manner that can be easily explained to clinicians. Argumentation is therefore a suitable method for generating early diagnostic recommendations. Argumentation-based CDSSs have been developed in a wide variety of clinical domains. However, the impact of an argumentation-based diagnostic Clinical Decision Support System (CDSS) has not been evaluated yet.
The first part of this thesis evaluates the impact of guideline recommendations and an argumentation-based diagnostic CDSS on clinician information gathering and diagnostic decision making. In addition, the impact of guideline recommendations on management decision making was evaluated. The study found that argumentation is a viable method for generating diagnostic recommendations that can potentially help reduce diagnostic error. The study showed that guideline recommendations do have a positive impact on information gathering of optometrists and can potentially help optometrists in asking the right questions and performing tests as per current standards of care. Guideline recommendations were found to have a positive impact on management decision making. The CDSS is dependent on quality of data that is entered into the system. Faulty interpretation of data can lead the clinician to enter wrong data and cause the CDSS to provide wrong recommendations.
Current generation argumentation-based CDSSs and other diagnostic decision support systems have problems with semantic interoperability that prevents them from using data from the web. The clinician and CDSS is limited to information collected during a clinical encounter and cannot access information on the web that could be relevant to a patient. This is due to the distributed nature of medical information and lack of semantic interoperability between healthcare systems. Current argumentation-based decision support applications require specialized tools for modelling and execution and this prevents widespread use and adoption of these tools especially when these tools require additional training and licensing arrangements.
Semantic web and linked data technologies have been developed to overcome problems with semantic interoperability on the web. Ontology-based diagnostic CDSS applications have been developed using semantic web technology to overcome problems with semantic interoperability of healthcare data in decision support applications. However, these models have problems with expressiveness, requiring specialized software and algorithms for generating diagnostic recommendations.
The second part of this thesis describes the development of an argumentation-based ontology driven diagnostic model and CDSS that can execute this model to generate ranked diagnostic recommendations. This novel model called the Disease-Symptom Model combines strengths of argumentation with strengths of semantic web technology. The model allows the domain expert to model arguments favouring and negating a diagnosis using OWL/RDF language. The model uses a simple weighting scheme that represents the degree of support of each argument within the model. The model uses SPARQL to sum weights and produce a ranked diagnostic recommendation. The model can provide justifications for each recommendation in a manner that clinicians can easily understand. CDSS prototypes that can execute this ontology model to generate diagnostic recommendations were developed. The decision support prototypes demonstrated the ability to use a wide variety of data and access remote data sources using linked data technologies to generate recommendations. The thesis was able to demonstrate the development of an argumentation-based ontology driven diagnostic decision support model and decision support system that can integrate information from a variety of sources to generate diagnostic recommendations. This decision support application was developed without the use of specialized software and tools for modelling and execution, while using a simple modelling method.
The third part of this thesis details evaluation of the Disease-Symptom model across all stages of a clinical encounter by comparing the performance of the model with clinicians. The evaluation showed that the Disease-Symptom Model can provide a ranked diagnostic recommendation in early stages of the clinical encounter that is comparable to clinicians. The diagnostic performance can be improved in the early stages using linked data technologies to incorporate more information into the decision making. With limited information, depending on the type of case, the performance of the Disease-Symptom Model will vary. As more information is collected during the clinical encounter the decision support application can provide recommendations that is comparable to clinicians recruited for the study. The evaluation showed that even with a simple weighting and summation method used in the Disease- Symptom Model the diagnostic ranking was comparable to dentists. With limited information in the early stages of the clinical encounter the Disease-Symptom Model was able to provide an accurately ranked diagnostic recommendation validating the model and methods used in this thesis
Semantic resources in pharmacovigilance: a corpus and an ontology for drug-drug interactions
Mención Internacional en el título de doctorNowadays, with the increasing use of several drugs for the treatment of one or more different diseases (polytherapy) in large populations, the risk for drugs combinations that have not been studied in pre-authorization clinical trials has increased. This provides a favourable setting for the occurrence of drug-drug interactions (DDIs), a common adverse drug reaction (ADR) representing an important risk to patients safety, and an increase in healthcare costs. Their early detection is, therefore, a main concern in the clinical setting. Although there are different databases supporting healthcare professionals in the detection of DDIs, the quality of these databases is very uneven, and the consistency of their content is limited. Furthermore, these databases do not scale well to the large and growing number of pharmacovigilance literature in recent years. In addition, large amounts of current and valuable information are hidden in published articles, scientific journals, books, and technical reports. Thus, the large number of DDI information sources has overwhelmed most healthcare professionals because it is not possible to remain up to date on everything published about DDIs.
Computational methods can play a key role in the identification, explanation, and prediction of DDIs on a large scale, since they can be used to collect, analyze and manipulate large amounts of biological and pharmacological data. Natural language processing (NLP) techniques can be used to retrieve and extract DDI information from pharmacological texts, supporting researchers and healthcare professionals on the challenging task of searching DDI information among different and heterogeneous sources. However, these methods rely on the availability of specific resources providing the domain knowledge, such as databases, terminological vocabularies, corpora, ontologies, and so forth, which are necessary to address the Information Extraction (IE) tasks.
In this thesis, we have developed two semantic resources for the DDI domain that make an important contribution to the research and development of IE systems for DDIs. We have reviewed and analyzed the existing corpora and ontologies relevant to this domain, based on their strengths and weaknesses, we have developed the DDI corpus and the ontology for drug-drug interactions (named DINTO). The DDI corpus has proven to fulfil the characteristics of a high-quality gold-standard, and has demonstrated its usefulness as a benchmark for the training and testing of different IE systems in the SemEval-2013 DDIExtraction shared task. Meanwhile, DINTO has been used and evaluated in two different applications. Firstly, it has been proven that the knowledge represented in the ontology can be used to infer DDIs and their different mechanisms. Secondly, we have provided a proof-of-concept of the contribution of DINTO to NLP, by providing the domain knowledge to be exploited by an IE pilot prototype. From these results, we believe that these two semantic resources will encourage further research into the application of computational methods to the early detection of DDIs.
This work has been partially supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542], by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01] and by the European Commission Seventh Framework Programme under TrendMiner project [FP7-ICT287863].Hoy en día ha habido un notable aumento del número de pacientes polimedicados que reciben simultáneamente varios fármacos para el tratamiento de una o varias enfermedades. Esta situación proporciona el escenario ideal para la prescripción de combinaciones de fármacos que no han sido estudiadas previamente en ensayos clínicos, y puede dar lugar a un aumento de interacciones farmacológicas (DDIs por sus siglas en inglés). Las interacciones entre fármacos son un tipo de reacción adversa que supone no sólo un riesgo para los pacientes, sino también una importante causa de aumento del gasto sanitario. Por lo tanto, su detección temprana es crucial en la práctica clínica. En la actualidad existen diversos recursos y bases de datos que pueden ayudar a los profesionales sanitarios en la detección de posibles interacciones farmacológicas. Sin embargo, la calidad de su información varía considerablemente de unos a otros, y la consistencia de sus contenidos es limitada. Además, la actualización de estos recursos es difícil debido al aumento que ha experimentado la literatura farmacológica en los últimos años. De hecho, mucha información sobre DDIs se encuentra dispersa en artículos, revistas científicas, libros o informes técnicos, lo que ha hecho que la mayoría de los profesionales sanitarios se hayan visto abrumados al intentar mantenerse actualizados en el dominio de las interacciones farmacológicas.
La ingeniería informática puede representar un papel fundamental en este campo permitiendo la identificación, explicación y predicción de DDIs, ya que puede ayudar a recopilar, analizar y manipular grandes cantidades de datos biológicos y farmacológicos. En concreto, las técnicas del procesamiento del lenguaje natural (PLN) pueden ayudar a recuperar y extraer información sobre DDIs de textos farmacológicos, ayudando a los investigadores y profesionales sanitarios en la complicada tarea de buscar esta información en diversas fuentes. Sin embargo, el desarrollo de estos métodos depende de la disponibilidad de recursos específicos que proporcionen el conocimiento del dominio, como bases de datos, vocabularios terminológicos, corpora u ontologías, entre otros, que son necesarios para desarrollar las tareas de extracción de información (EI).
En el marco de esta tesis hemos desarrollado dos recursos semánticos en el dominio de las interacciones farmacológicas que suponen una importante contribución a la investigación y al desarrollo de sistemas de EI sobre DDIs. En primer lugar hemos revisado y analizado los corpora y ontologías existentes relevantes para el dominio y, en base a sus potenciales y limitaciones, hemos desarrollado el corpus DDI y la ontología para interacciones farmacológicas DINTO. El corpus DDI ha demostrado cumplir con las características de un estándar de oro de gran calidad, así como su utilidad para el entrenamiento y evaluación de distintos sistemas en la tarea de extracción de información SemEval-2013 DDIExtraction Task. Por su parte, DINTO ha sido utilizada y evaluada en dos aplicaciones diferentes. En primer lugar, hemos demostrado que esta ontología puede ser utilizada para inferir interacciones entre fármacos y los mecanismos por los que ocurren. En segundo lugar, hemos obtenido una primera prueba de concepto de la contribución de DINTO al área del PLN al proporcionar el conocimiento del dominio necesario para ser explotado por un prototipo de un sistema de EI. En vista de estos resultados, creemos que estos dos recursos semánticos pueden estimular la investigación en el desarrollo de métodos computaciones para la detección temprana de DDIs.
Este trabajo ha sido financiado parcialmente por el Gobierno Regional de Madrid a través de la red de investigación MA2VICMR [S2009/TIC-1542], por el Ministerio de Educación Español, a través del proyecto MULTIMEDICA [TIN2010-20644-C03-01], y por el Séptimo Programa Macro de la Comisión Europea a través del proyecto TrendMiner [FP7-ICT287863].This work has been partially supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542], by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01] and by the European Commission Seventh Framework Programme under TrendMiner project [FP7-ICT287863].Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Asunción Gómez Pérez.- Secretario: María Belén Ruiz Mezcua.- Vocal: Mariana Neve