835 research outputs found

    Linking patient data to scientific knowledge to support contextualized mining

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022ICU readmissions are a critical problem associated with either serious conditions, ill nesses, or complications, representing a 4 times increase in mortality risk and a financial burden to health institutions. In developed countries 1 in every 10 patients discharged comes back to the ICU. As hospitals become more and more data-oriented with the adop tion of Electronic Health Records (EHR), there as been a rise in the development of com putational approaches to support clinical decision. In recent years new efforts emerged, using machine learning approaches to make ICU readmission predictions directly over EHR data. Despite these growing efforts, machine learning approaches still explore EHR data directly without taking into account its mean ing or context. Medical knowledge is not accessible to these methods, who work blindly over the data, without considering the meaning and relationships the data objects. Ontolo gies and knowledge graphs can help bridge this gap between data and scientific context, since they are computational artefacts that represent the entities in a domain and how the relate to each other in a formalized fashion. This opportunity motivated the aim of this work: to investigate how enriching EHR data with ontology-based semantic annotations and applying machine learning techniques that explore them can impact the prediction of 30-day ICU readmission risk. To achieve this, a number of contributions were developed, including: (1) An enrichment of the MIMIC-III data set with annotations to several biomedical ontologies; (2) A novel ap proach to predict ICU readmission risk that explores knowledge graph embeddings to represent patient data taking into account the semantic annotations; (3) A variant of the predictive approach that targets different moments to support risk prediction throughout the ICU stay. The predictive approaches outperformed both state-of-the-art and a baseline achieving a ROC-AUC of 0.815 (an increase of 0.2 over the state of the art). The positive results achieved motivated the development of an entrepreneurial project, which placed in the Top 5 of the H-INNOVA 2021 entrepreneurship award

    A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration

    Get PDF
    The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone

    Using dates as contextual information for personalised cultural heritage experiences

    Get PDF
    We present semantics-based mechanisms that aim to promote reflection on cultural heritage by means of dates (historical events or annual commemorations), owing to their connections to a collection of items and to the visitors’ interests. We argue that links to specific dates can trigger curiosity, increase retention and guide visitors around the venue following new appealing narratives in subsequent visits. The proposal has been evaluated in a pilot study on the collection of the Archaeological Museum of Tripoli (Greece), for which a team of humanities experts wrote a set of diverse narratives about the exhibits. A year-round calendar was crafted so that certain narratives would be more or less relevant on any given day. Expanding on this calendar, personalised recommendations can be made by sorting out those relevant narratives according to personal events and interests recorded in the profiles of the target users. Evaluation of the associations by experts and potential museum visitors shows that the proposed approach can discover meaningful connections, while many others that are more incidental can still contribute to the intended cognitive phenomena

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    The Knowledge Grid: A Platform to Increase the Interoperability of Computable Knowledge and Produce Advice for Health

    Full text link
    Here we demonstrate how more highly interoperable computable knowledge enables systems to generate large quantities of evidence-based advice for health. We first provide a thorough analysis of advice. Then, because advice derives from knowledge, we turn our focus to computable, i.e., machine-interpretable, forms for knowledge. We consider how computable knowledge plays dual roles as a resource conveying content and as an advice enabler. In this latter role, computable knowledge is combined with data about a decision situation to generate advice targeted at the pending decision. We distinguish between two types of automated services. When a computer system provides computable knowledge, we say that it provides a knowledge service. When computer system combines computable knowledge with instance data to provide advice that is specific to an unmade decision we say that it provides an advice-giving service. The work here aims to increase the interoperability of computable knowledge to bring about better knowledge services and advice-giving services for health. The primary motivation for this research is the problem of missing or inadequate advice about health topics. The global demand for well-informed health advice far exceeds the global supply. In part to overcome this scarcity, the design and development of Learning Health Systems is being pursued at various levels of scale: local, regional, state, national, and international. Learning Health Systems fuse capabilities to generate new computable biomedical knowledge with other capabilities to rapidly and widely use computable biomedical knowledge to inform health practices and behaviors with advice. To support Learning Health Systems, we believe that knowledge services and advice-giving services have to be more highly interoperable. I use examples of knowledge services and advice-giving services which exclusively support medication use. This is because I am a pharmacist and pharmacy is the biomedical domain that I know. The examples here address the serious problems of medication adherence and prescribing safety. Two empirical studies are shared that demonstrate the potential to address these problems and make improvements by using advice. But primarily we use these examples to demonstrate general and critical differences between stand-alone, unique approaches to handling computable biomedical knowledge, which make it useful for one system, and common, more highly interoperable approaches, which can make it useful for many heterogeneous systems. Three aspects of computable knowledge interoperability are addressed: modularity, identity, and updateability. We demonstrate that instances of computable knowledge, and related instances of knowledge services and advice-giving services, can be modularized. We also demonstrate the utility of uniquely identifying modular instances of computable knowledge. Finally, we build on the computing concept of pipelining to demonstrate how computable knowledge modules can automatically be updated and rapidly deployed. Our work is supported by a fledgling technical knowledge infrastructure platform called the Knowledge Grid. It includes formally specified compound digital objects called Knowledge Objects, a conventional digital Library that serves as a Knowledge Object repository, and an Activator that provides an application programming interface (API) for computable knowledge. The Library component provides knowledge services. The Activator component provides both knowledge services and advice-giving services. In conclusion, by increasing the interoperability of computable biomedical knowledge using the Knowledge Grid, we demonstrate new capabilities to generate well-informed health advice at a scale. These new capabilities may ultimately support Learning Health Systems and boost health for large populations of people who would otherwise not receive well-informed health advice.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146073/1/ajflynn_1.pd

    Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2

    Get PDF
    Background: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems’ normalizing short forms compared to a majority sense baseline approach, 2) performance of participants’ systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems’ normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.</p

    Investigating the translation of metaphors used in diagnosis and treatment in Chinese medicine classics Neijing and Shanghan Lun

    Get PDF
    The language used in Traditional Chinese Medicine (TCM) depicts a world of human physiology, pathology, diagnosis and treatment, in which metaphors serve as an essential vehicle for readers to understand fundamental but often abstract concepts in TCM. While previous work has investigated strategies for translating the TCM classics, the metaphors used to describe diagnosis and treatment and their English translations are critical in understanding TCM, and require a more systematic exploration. This study investigates the diagnosis- and treatment-related metaphors selected from two TCM classics, Neijing and Shanghan Lun, and their English renditions by translators from different professional backgrounds. The thesis also focuses on the analysis of the effectiveness of different translation strategies in delivering pertinent health-related information conveyed by the metaphors of the original texts. A multidimensional framework that combines a conceptual approach with linguistic and cultural elements was established to capture the complexity of the metaphors, particularly from the perspective of translation. The linguistic metaphors in this study were first identified from a purpose-built corpus using a CMT-based metaphor identification procedure adapted from Steen (2010). Following the conceptual metaphor inference procedure developed by Steen (2011), various conceptual metaphors were inferred from the linguistic metaphors. Corresponding English translations were also collected to investigate which translation strategies have been used and which strategy can most effectively deliver the health-related information conveyed by the metaphors. Four main strategies were employed in the English translations: 1) equivalent mapping, by which the source domain is retained; 2) using a simile to translate a metaphor; 3) direct narrative equivalence, which abandons the metaphor and narrates the medical knowledge directly; and 4) complemented equivalent translation, whereby the metaphor is explained with additional content. From the perspective of conveying health-related knowledge, equivalent mapping was effective for metaphors universally understood by Chinese and English readers. For culturally specific metaphors, especially when the metaphor relates to an important TCM concept, complemented equivalent translation, which can reconfigure the cognitive context for the reader, was most suitable. For metaphors not related to important concepts, direct narrative equivalence was found to be effective
    • …
    corecore