26 research outputs found
Recommended from our members
Leveraging Knowledge-Based Approaches to Promote Antiretroviral Toxicity Monitoring in Underserved Settings
As access and use of antiretroviral therapy continue to increase, the need to improve antiretroviral toxicity monitoring becomes more critical. This is particularly so in underserved settings, where patterns of antiretroviral toxicities possibly alter the need for and frequency of antiretroviral toxicity monitoring. However, barriers such as few skilled healthcare providers and poor infrastructure make antiretroviral toxicity monitoring in underserved settings difficult. The purpose of this dissertation was to investigate how standard clinical guidelines, knowledge-based clinical decision support, and task delegation could be leveraged to overcome barriers to antiretroviral toxicity monitoring in underserved settings.
The strategy adopted in this dissertation was guided by the Design Science Research Methodology that emphasizes the generation of scientific knowledge through building novel artifacts. Two qualitative descriptive studies were conducted to characterize the contextual factors associated with antiretroviral toxicity monitoring in underserved settings. Supported by the findings from these studies, a knowledge-based software application prototype that implements clinical practice guidelines for antiretroviral toxicity monitoring was developed. Next, a quantitative validation study was used to evaluate the structure and behavior of the prototype’s knowledge base. Lastly, a quantitative usability study was conducted to assess lay health worker perceptions of the satisfaction and mental effort associated with the use of checklists generated by the prototype.
This dissertation research produced empirical evidence about the broad motives and strategies for promoting medication adherence, safety, and effectiveness in underserved settings. It also identified barriers and facilitators of antiretroviral toxicity monitoring within ambulatory HIV care workflows in underserved settings. Additionally, it provided evidence about the extent to which antiretroviral toxicity domain knowledge could be implemented in a knowledge-based application for supporting point-of-care antiretroviral toxicity monitoring. Lastly, the research provided previously unavailable empirical evidence about the perceptions of lay peer health workers on the use of checklists for the documentation of antiretroviral toxicities
Linking patient data to scientific knowledge to support contextualized mining
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022ICU readmissions are a critical problem associated with either serious conditions, ill nesses, or complications, representing a 4 times increase in mortality risk and a financial
burden to health institutions. In developed countries 1 in every 10 patients discharged
comes back to the ICU. As hospitals become more and more data-oriented with the adop tion of Electronic Health Records (EHR), there as been a rise in the development of com putational approaches to support clinical decision.
In recent years new efforts emerged, using machine learning approaches to make ICU
readmission predictions directly over EHR data. Despite these growing efforts, machine
learning approaches still explore EHR data directly without taking into account its mean ing or context. Medical knowledge is not accessible to these methods, who work blindly
over the data, without considering the meaning and relationships the data objects. Ontolo gies and knowledge graphs can help bridge this gap between data and scientific context,
since they are computational artefacts that represent the entities in a domain and how the
relate to each other in a formalized fashion.
This opportunity motivated the aim of this work: to investigate how enriching EHR
data with ontology-based semantic annotations and applying machine learning techniques
that explore them can impact the prediction of 30-day ICU readmission risk. To achieve
this, a number of contributions were developed, including: (1) An enrichment of the
MIMIC-III data set with annotations to several biomedical ontologies; (2) A novel ap proach to predict ICU readmission risk that explores knowledge graph embeddings to
represent patient data taking into account the semantic annotations; (3) A variant of the
predictive approach that targets different moments to support risk prediction throughout
the ICU stay.
The predictive approaches outperformed both state-of-the-art and a baseline achieving
a ROC-AUC of 0.815 (an increase of 0.2 over the state of the art). The positive results
achieved motivated the development of an entrepreneurial project, which placed in the
Top 5 of the H-INNOVA 2021 entrepreneurship award
Recommended from our members
Identifying and reducing inappropriate use of medications using Electronic Health Records
Inappropriate use of medications (IUM) is a global problem that can lead to unnecessary harm to the patients and unnecessary costs across the health care system. Identifying and reducing IUM has been a long-lasting challenge and currently, no systematic and automated solution exists to address it. IUM can be manually identified by experts using medication appropriateness criteria (MAC).
In this research I first conducted a review of approaches used to identify IUM and reduce IUM. Next, I developed a conceptual model for representing the MAC, and then developed a tool and a workflow for translating the MAC into structured form. Because indications are an important component of the MAC, I conducted a critical appraisal of existing knowledge sources that can be used to that end, namely the medication-indication knowledge-bases. Finally, I demonstrated how these structured MAC can be used to identify patients who are potentially subject to IUM and evaluated the accuracy of this approach.
This research identifies the knowledge gaps and technological challenges in identifying and reducing IUM and addresses some of these gaps through the creation of a representation for MAC, a repository of structured MAC, and a set of tools that can assist in evaluating the impact of interventions aimed to reduce IUM or assess its downstream effects. This research also discusses the limitations of existing methods for executing computable decision support rules and proposes solutions needed to enhance these methods so they can support implementation of the MAC
Natural Language Processing and Graph Representation Learning for Clinical Data
The past decade has witnessed remarkable progress in biomedical informatics and its related fields: the development of high-throughput technologies in genomics, the mass adoption of electronic health records systems, and the AI renaissance largely catalyzed by deep learning. Deep learning has played an undeniably important role in our attempts to reduce the gap between the exponentially growing amount of biomedical data and our ability to make sense of them. In particular, the two main pillars of this dissertation---natural language processing and graph representation learning---have improved our capacity to learn useful representations of language and structured data to an extent previously considered unattainable in such a short time frame. In the context of clinical data, characterized by its notorious heterogeneity and complexity, natural language processing and graph representation learning have begun to enrich our toolkits for making sense and making use of the wealth of biomedical data beyond rule-based systems or traditional regression techniques. This dissertation comes at the cusp of such a paradigm shift, detailing my journey across the fields of biomedical and clinical informatics through the lens of natural language processing and graph representation learning. The takeaway is quite optimistic: despite the many layers of inefficiencies and challenges in the healthcare ecosystem, AI for healthcare is gearing up to transform the world in new and exciting ways
Improving the Quality and Utility of Electronic Health Record Data through Ontologies
The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors’ rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs
A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration
The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years.
However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos.
This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information.
Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information.
Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization.
Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata.
Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content.
In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users.
By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces.
We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content.
To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications.
These use cases address four aspects of the WYSIWYM implementation:
1) Its integration into existing user interfaces,
2) Utilizing it for lightweight text analytics to incentivize users,
3) Dealing with crowdsourcing of semi-structured e-learning content,
4) Incorporating it for authoring of semantic medical prescriptions
Semantic resources in pharmacovigilance: a corpus and an ontology for drug-drug interactions
Mención Internacional en el título de doctorNowadays, with the increasing use of several drugs for the treatment of one or more different diseases (polytherapy) in large populations, the risk for drugs combinations that have not been studied in pre-authorization clinical trials has increased. This provides a favourable setting for the occurrence of drug-drug interactions (DDIs), a common adverse drug reaction (ADR) representing an important risk to patients safety, and an increase in healthcare costs. Their early detection is, therefore, a main concern in the clinical setting. Although there are different databases supporting healthcare professionals in the detection of DDIs, the quality of these databases is very uneven, and the consistency of their content is limited. Furthermore, these databases do not scale well to the large and growing number of pharmacovigilance literature in recent years. In addition, large amounts of current and valuable information are hidden in published articles, scientific journals, books, and technical reports. Thus, the large number of DDI information sources has overwhelmed most healthcare professionals because it is not possible to remain up to date on everything published about DDIs.
Computational methods can play a key role in the identification, explanation, and prediction of DDIs on a large scale, since they can be used to collect, analyze and manipulate large amounts of biological and pharmacological data. Natural language processing (NLP) techniques can be used to retrieve and extract DDI information from pharmacological texts, supporting researchers and healthcare professionals on the challenging task of searching DDI information among different and heterogeneous sources. However, these methods rely on the availability of specific resources providing the domain knowledge, such as databases, terminological vocabularies, corpora, ontologies, and so forth, which are necessary to address the Information Extraction (IE) tasks.
In this thesis, we have developed two semantic resources for the DDI domain that make an important contribution to the research and development of IE systems for DDIs. We have reviewed and analyzed the existing corpora and ontologies relevant to this domain, based on their strengths and weaknesses, we have developed the DDI corpus and the ontology for drug-drug interactions (named DINTO). The DDI corpus has proven to fulfil the characteristics of a high-quality gold-standard, and has demonstrated its usefulness as a benchmark for the training and testing of different IE systems in the SemEval-2013 DDIExtraction shared task. Meanwhile, DINTO has been used and evaluated in two different applications. Firstly, it has been proven that the knowledge represented in the ontology can be used to infer DDIs and their different mechanisms. Secondly, we have provided a proof-of-concept of the contribution of DINTO to NLP, by providing the domain knowledge to be exploited by an IE pilot prototype. From these results, we believe that these two semantic resources will encourage further research into the application of computational methods to the early detection of DDIs.
This work has been partially supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542], by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01] and by the European Commission Seventh Framework Programme under TrendMiner project [FP7-ICT287863].Hoy en día ha habido un notable aumento del número de pacientes polimedicados que reciben simultáneamente varios fármacos para el tratamiento de una o varias enfermedades. Esta situación proporciona el escenario ideal para la prescripción de combinaciones de fármacos que no han sido estudiadas previamente en ensayos clínicos, y puede dar lugar a un aumento de interacciones farmacológicas (DDIs por sus siglas en inglés). Las interacciones entre fármacos son un tipo de reacción adversa que supone no sólo un riesgo para los pacientes, sino también una importante causa de aumento del gasto sanitario. Por lo tanto, su detección temprana es crucial en la práctica clínica. En la actualidad existen diversos recursos y bases de datos que pueden ayudar a los profesionales sanitarios en la detección de posibles interacciones farmacológicas. Sin embargo, la calidad de su información varía considerablemente de unos a otros, y la consistencia de sus contenidos es limitada. Además, la actualización de estos recursos es difícil debido al aumento que ha experimentado la literatura farmacológica en los últimos años. De hecho, mucha información sobre DDIs se encuentra dispersa en artículos, revistas científicas, libros o informes técnicos, lo que ha hecho que la mayoría de los profesionales sanitarios se hayan visto abrumados al intentar mantenerse actualizados en el dominio de las interacciones farmacológicas.
La ingeniería informática puede representar un papel fundamental en este campo permitiendo la identificación, explicación y predicción de DDIs, ya que puede ayudar a recopilar, analizar y manipular grandes cantidades de datos biológicos y farmacológicos. En concreto, las técnicas del procesamiento del lenguaje natural (PLN) pueden ayudar a recuperar y extraer información sobre DDIs de textos farmacológicos, ayudando a los investigadores y profesionales sanitarios en la complicada tarea de buscar esta información en diversas fuentes. Sin embargo, el desarrollo de estos métodos depende de la disponibilidad de recursos específicos que proporcionen el conocimiento del dominio, como bases de datos, vocabularios terminológicos, corpora u ontologías, entre otros, que son necesarios para desarrollar las tareas de extracción de información (EI).
En el marco de esta tesis hemos desarrollado dos recursos semánticos en el dominio de las interacciones farmacológicas que suponen una importante contribución a la investigación y al desarrollo de sistemas de EI sobre DDIs. En primer lugar hemos revisado y analizado los corpora y ontologías existentes relevantes para el dominio y, en base a sus potenciales y limitaciones, hemos desarrollado el corpus DDI y la ontología para interacciones farmacológicas DINTO. El corpus DDI ha demostrado cumplir con las características de un estándar de oro de gran calidad, así como su utilidad para el entrenamiento y evaluación de distintos sistemas en la tarea de extracción de información SemEval-2013 DDIExtraction Task. Por su parte, DINTO ha sido utilizada y evaluada en dos aplicaciones diferentes. En primer lugar, hemos demostrado que esta ontología puede ser utilizada para inferir interacciones entre fármacos y los mecanismos por los que ocurren. En segundo lugar, hemos obtenido una primera prueba de concepto de la contribución de DINTO al área del PLN al proporcionar el conocimiento del dominio necesario para ser explotado por un prototipo de un sistema de EI. En vista de estos resultados, creemos que estos dos recursos semánticos pueden estimular la investigación en el desarrollo de métodos computaciones para la detección temprana de DDIs.
Este trabajo ha sido financiado parcialmente por el Gobierno Regional de Madrid a través de la red de investigación MA2VICMR [S2009/TIC-1542], por el Ministerio de Educación Español, a través del proyecto MULTIMEDICA [TIN2010-20644-C03-01], y por el Séptimo Programa Macro de la Comisión Europea a través del proyecto TrendMiner [FP7-ICT287863].This work has been partially supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542], by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01] and by the European Commission Seventh Framework Programme under TrendMiner project [FP7-ICT287863].Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Asunción Gómez Pérez.- Secretario: María Belén Ruiz Mezcua.- Vocal: Mariana Neve
Neural Representations of Concepts and Texts for Biomedical Information Retrieval
Information retrieval (IR) methods are an indispensable tool in the current landscape of exponentially increasing textual data, especially on the Web. A typical IR task involves fetching and ranking a set of documents (from a large corpus) in terms of relevance to a user\u27s query, which is often expressed as a short phrase. IR methods are the backbone of modern search engines where additional system-level aspects including fault tolerance, scale, user interfaces, and session maintenance are also addressed. In addition to fetching documents, modern search systems may also identify snippets within the documents that are potentially most relevant to the input query. Furthermore, current systems may also maintain preprocessed structured knowledge derived from textual data as so called knowledge graphs, so certain types of queries that are posed as questions can be parsed as such; a response can be an output of one or more named entities instead of a ranked list of documents (e.g., what diseases are associated with EGFR mutations? ). This refined setup is often termed as question answering (QA) in the IR and natural language processing (NLP) communities.
In biomedicine and healthcare, specialized corpora are often at play including research articles by scientists, clinical notes generated by healthcare professionals, consumer forums for specific conditions (e.g., cancer survivors network), and clinical trial protocols (e.g., www.clinicaltrials.gov). Biomedical IR is specialized given the types of queries and the variations in the texts are different from that of general Web documents. For example, scientific articles are more formal with longer sentences but clinical notes tend to have less grammatical conformity and are rife with abbreviations. There is also a mismatch between the vocabulary of consumers and the lingo of domain experts and professionals. Queries are also different and can range from simple phrases (e.g., COVID-19 symptoms ) to more complex implicitly fielded queries (e.g., chemotherapy regimens for stage IV lung cancer patients with ALK mutations ). Hence, developing methods for different configurations (corpus, query type, user type) needs more deliberate attention in biomedical IR.
Representations of documents and queries are at the core of IR methods and retrieval methodology involves coming up with these representations and matching queries with documents based on them. Traditional IR systems follow the approach of keyword based indexing of documents (the so called inverted index) and matching query phrases against the document index. It is not difficult to see that this keyword based matching ignores the semantics of texts (synonymy at the lexeme level and entailment at phrase/clause/sentence levels) and this has lead to dimensionality reduction methods such as latent semantic indexing that generally have scale-related concerns; such methods also do not address similarity at the sentence level. Since the resurgence of neural network methods in NLP, the IR field has also moved to incorporate advances in neural networks into current IR methods.
This dissertation presents four specific methodological efforts toward improving biomedical IR. Neural methods always begin with dense embeddings for words and concepts to overcome the limitations of one-hot encoding in traditional NLP/IR. In the first effort, we present a new neural pre-training approach to jointly learn word and concept embeddings for downstream use in applications. In the second study, we present a joint neural model for two essential subtasks of information extraction (IE): named entity recognition (NER) and entity normalization (EN). Our method detects biomedical concept phrases in texts and links them to the corresponding semantic types and entity codes. These first two studies provide essential tools to model textual representations as compositions of both surface forms (lexical units) and high level concepts with potential downstream use in QA. In the third effort, we present a document reranking model that can help surface documents that are likely to contain answers (e.g, factoids, lists) to a question in a QA task. The model is essentially a sentence matching neural network that learns the relevance of a candidate answer sentence to the given question parametrized with a bilinear map. In the fourth effort, we present another document reranking approach that is tailored for precision medicine use-cases. It combines neural query-document matching and faceted text summarization. The main distinction of this effort from previous efforts is to pivot from a query manipulation setup to transforming candidate documents into pseudo-queries via neural text summarization. Overall, our contributions constitute nontrivial advances in biomedical IR using neural representations of concepts and texts