39 research outputs found

    Data governance in the health industry: investigating data quality dimensions within a big data context

    Get PDF
    In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined

    Data governance in the health industry: investigating data quality dimensions within a big data context

    Get PDF
    In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined

    A Methodological Framework for the Integrated Design of Decision-Intensive Care Pathways\u2014an Application to the Management of COPD Patients

    Get PDF
    Healthcare processes are by nature complex, mostly due to their multi-disciplinary character that requires continuous coordination between care providers. They encompass both organizational and clinical tasks, the latter ones driven by med- ical knowledge, which is inherently incomplete and distributed among people having different expertise and roles. Care pathways refer to planning and coordination of care processes related to specific groups of patients in a given setting. The goal in defining and following care pathways is to improve the quality of care in terms of patient satisfaction, costs reduction, and medical outcome. Thus, care pathways are a promising methodological tool for standardizing care and decision-making. Business process management techniques can successfully be used for representing organiza- tional aspects of care pathways in a standard, readable, and accessible way, while supporting process development, analysis, and re-engineering. In this paper, we intro- duce a methodological framework that fosters the integrated design, implementation, and enactment of care processes and related decisions, while considering proper rep- resentation and management of organizational and clinical information. We focus here and discuss in detail the design phase, which encompasses the simulation of care pathways. We show how business process model and notation (BPMN) and decision model and notation (DMN) can be combined for supporting intertwined aspects of decision-intensive care pathways. As a proof-of-concept, the proposed methodology has been applied to design care pathways related to chronic obstructive pulmonary disease (COPD) in the region of Veneto, in Italy

    The IHI Rochester Report 2022 on Healthcare Informatics Research: Resuming After the CoViD-19

    Get PDF
    In 2020, the CoViD-19 pandemic spread worldwide in an unexpected way and suddenly modified many life issues, including social habits, social relationships, teaching modalities, and more. Such changes were also observable in many different healthcare and medical contexts. Moreover, the CoViD-19 pandemic acted as a stress test for many research endeavors, and revealed some limitations, especially in contexts where research results had an immediate impact on the social and healthcare habits of millions of people. As a result, the research community is called to perform a deep analysis of the steps already taken, and to re-think steps for the near and far future to capitalize on the lessons learned due to the pandemic. In this direction, on June 09th-11th, 2022, a group of twelve healthcare informatics researchers met in Rochester, MN, USA. This meeting was initiated by the Institute for Healthcare Informatics-IHI, and hosted by the Mayo Clinic. The goal of the meeting was to discuss and propose a research agenda for biomedical and health informatics for the next decade, in light of the changes and the lessons learned from the CoViD-19 pandemic. This article reports the main topics discussed and the conclusions reached. The intended readers of this paper, besides the biomedical and health informatics research community, are all those stakeholders in academia, industry, and government, who could benefit from the new research findings in biomedical and health informatics research. Indeed, research directions and social and policy implications are the main focus of the research agenda we propose, according to three levels: the care of individuals, the healthcare system view, and the population view

    From narrative descriptions to MedDRA: automagically encoding adverse drug reactions

    Get PDF
    The collection of narrative spontaneous reports is an irreplaceable source for the prompt detection of suspected adverse drug reactions (ADRs). In such task qualified domain experts manually revise a huge amount of narrative descriptions and then encode texts according to MedDRA standard terminology. The manual annotation of narrative documents with medical terminology is a subtle and expensive task, since the number of reports is growing up day-by-day. Natural Language Processing (NLP) applications can support the work of people responsible for pharmacovigilance. Our objective is to develop NLP algorithms and tools for the detection of ADR clinical terminology. Efficient applications can concretely improve the quality of the experts\u2019 revisions. NLP software can quickly analyze narrative texts and offer an encoding (i.e., a list of MedDRA terms) that the expert has to revise and validate. MagiCoder, an NLP algorithm, is proposed for the automatic encoding of free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in terms of computational complexity. We tested MagiCoder through several experiments. In the first one, we tested it on a large dataset of about 4500 manually revised reports, by performing an automated comparison between human and MagiCoder encoding. Moreover, we tested MagiCoder on a set of about 1800 reports, manually revised ex novo by some experts of the domain, who also compared automatic solutions with the gold reference standard. We also provide two initial experiments with reports written in English, giving a first evidence of the robustness of MagiCoder w.r.t. the change of the language. For the current base version of MagiCoder, we measured an average recall and precision of and , respectively. From a practical point of view, MagiCoder reduces the time required for encoding ADR reports. Pharmacologists have only to review and validate the MedDRA terms proposed by the application, instead of choosing the right terms among the 70\u202fK low level terms of MedDRA. Such improvement in the efficiency of pharmacologists\u2019 work has a relevant impact also on the quality of the subsequent data analysis. We developed MagiCoder for the Italian pharmacovigilance language. However, our proposal is based on a general approach, not depending on the considered language nor the term dictionary

    Approximate Data Mining Techniques on Clinical Data

    Get PDF
    The past two decades have witnessed an explosion in the number of medical and healthcare datasets available to researchers and healthcare professionals. Data collection efforts are highly required, and this prompts the development of appropriate data mining techniques and tools that can automatically extract relevant information from data. Consequently, they provide insights into various clinical behaviors or processes captured by the data. Since these tools should support decision-making activities of medical experts, all the extracted information must be represented in a human-friendly way, that is, in a concise and easy-to-understand form. To this purpose, here we propose a new framework that collects different new mining techniques and tools proposed. These techniques mainly focus on two aspects: the temporal one and the predictive one. All of these techniques were then applied to clinical data and, in particular, ICU data from MIMIC III database. It showed the flexibility of the framework, which is able to retrieve different outcomes from the overall dataset. The first two techniques rely on the concept of Approximate Temporal Functional Dependencies (ATFDs). ATFDs have been proposed, with their suitable treatment of temporal information, as a methodological tool for mining clinical data. An example of the knowledge derivable through dependencies may be "within 15 days, patients with the same diagnosis and the same therapy usually receive the same daily amount of drug". However, current ATFD models are not analyzing the temporal evolution of the data, such as "For most patients with the same diagnosis, the same drug is prescribed after the same symptom". To this extent, we propose a new kind of ATFD called Approximate Pure Temporally Evolving Functional Dependencies (APEFDs). Another limitation of such kind of dependencies is that they cannot deal with quantitative data when some tolerance can be allowed for numerical values. In particular, this limitation arises in clinical data warehouses, where analysis and mining have to consider one or more measures related to quantitative data (such as lab test results and vital signs), concerning multiple dimensional (alphanumeric) attributes (such as patient, hospital, physician, diagnosis) and some time dimensions (such as the day since hospitalization and the calendar date). According to this scenario, we introduce a new kind of ATFD, named Multi-Approximate Temporal Functional Dependency (MATFD), which considers dependencies between dimensions and quantitative measures from temporal clinical data. These new dependencies may provide new knowledge as "within 15 days, patients with the same diagnosis and the same therapy receive a daily amount of drug within a fixed range". The other techniques are based on pattern mining, which has also been proposed as a methodological tool for mining clinical data. However, many methods proposed so far focus on mining of temporal rules which describe relationships between data sequences or instantaneous events, without considering the presence of more complex temporal patterns into the dataset. These patterns, such as trends of a particular vital sign, are often very relevant for clinicians. Moreover, it is really interesting to discover if some sort of event, such as a drug administration, is capable of changing these trends and how. To this extent, we propose a new kind of temporal patterns, called Trend-Event Patterns (TEPs), that focuses on events and their influence on trends that can be retrieved from some measures, such as vital signs. With TEPs we can express concepts such as "The administration of paracetamol on a patient with an increasing temperature leads to a decreasing trend in temperature after such administration occurs". We also decided to analyze another interesting pattern mining technique that includes prediction. This technique discovers a compact set of patterns that aim to describe the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important to improve the overall class prediction performance. We show that our classification approach achieves a significant reduction in the number of extracted patterns, compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model. For each technique described above, we developed a tool to retrieve its kind of rule. All the results are obtained by pre-processing and mining clinical data and, as mentioned before, in particular ICU data from MIMIC III database

    Ontologies Applied in Clinical Decision Support System Rules:Systematic Review

    Get PDF
    BackgroundClinical decision support systems (CDSSs) are important for the quality and safety of health care delivery. Although CDSS rules guide CDSS behavior, they are not routinely shared and reused. ObjectiveOntologies have the potential to promote the reuse of CDSS rules. Therefore, we systematically screened the literature to elaborate on the current status of ontologies applied in CDSS rules, such as rule management, which uses captured CDSS rule usage data and user feedback data to tailor CDSS services to be more accurate, and maintenance, which updates CDSS rules. Through this systematic literature review, we aim to identify the frontiers of ontologies used in CDSS rules. MethodsThe literature search was focused on the intersection of ontologies; clinical decision support; and rules in PubMed, the Association for Computing Machinery (ACM) Digital Library, and the Nursing & Allied Health Database. Grounded theory and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines were followed. One author initiated the screening and literature review, while 2 authors validated the processes and results independently. The inclusion and exclusion criteria were developed and refined iteratively. ResultsCDSSs were primarily used to manage chronic conditions, alerts for medication prescriptions, reminders for immunizations and preventive services, diagnoses, and treatment recommendations among 81 included publications. The CDSS rules were presented in Semantic Web Rule Language, Jess, or Jena formats. Despite the fact that ontologies have been used to provide medical knowledge, CDSS rules, and terminologies, they have not been used in CDSS rule management or to facilitate the reuse of CDSS rules. ConclusionsOntologies have been used to organize and represent medical knowledge, controlled vocabularies, and the content of CDSS rules. So far, there has been little reuse of CDSS rules. More work is needed to improve the reusability and interoperability of CDSS rules. This review identified and described the ontologies that, despite their limitations, enable Semantic Web technologies and their applications in CDSS rules

    Building a Persuasive Virtual Dietitian

    Get PDF
    This paper describes the Multimedia Application for Diet Management (MADiMan), a system that supports users in managing their diets while admitting diet transgressions. MADiMan consists of a numerical reasoner that takes into account users’ dietary constraints and automatically adapts the users’ diet, and of a natural language generation (NLG) system that automatically creates textual messages for explaining the results provided by the reasoner with the aim of persuading users to stick to a healthy diet. In the first part of the paper, we introduce the MADiMan system and, in particular, the basic mechanisms related to reasoning, data interpretation and content selection for a numeric data-to-text NLG system. We also discuss a number of factors influencing the design of the textual messages produced. In particular, we describe in detail the design of the sentence-aggregation procedure, which determines the compactness of the final message by applying two aggregation strategies. In the second part of the paper, we present the app that we developed, CheckYourMeal!, and the results of two human-based quantitative evaluations of the NLG module conducted using CheckYourMeal! in a simulation. The first evaluation, conducted with twenty users, ascertained both the perceived usefulness of graphics/text and the appeal, easiness and persuasiveness of the textual messages. The second evaluation, conducted with thirty-nine users, ascertained their persuasive power. The evaluations were based on the analysis of questionnaires and of logged data of users’ behaviour. Both evaluations showed significant results

    Automagically Encoding Adverse Drug Reactions in MedDRA

    Get PDF
    Abstract-Pharmacovigilance is the field of science devoted to the collection, analysis, and prevention of Adverse Drug Reactions (ADRs). Efficient strategies for the extraction of information about ADRs from free text sources are essential to support the important task of detecting and classifying unexpected pathologies, possibly related to (therapy-related) drug use. Narrative ADR descriptions may be collected in different ways, e.g., either by monitoring social networks or through the so called "spontaneous reporting, the main method pharmacovigilance adopts in order to identify ADRs. The encoding of free-text ADR descriptions according to MedDRA standard terminology is central for report analysis. It is a complex work, which has to be manually implemented by the pharmacovigilance experts. The manual encoding is expensive (in terms of time). Moreover, a problem about the accuracy of the encoding may occur, since the number of reports is growing up day by day. In this paper, we propose MagiCoder, an efficient Natural Language Processing algorithm able to automatically derive MedDRA terminologies from freetext ADR descriptions. MagiCoder is part of VigiWork, a web application for online ADR reporting and analysis. From a practical point of view, MagiCoder reduces the encoding time of ADR reports. Pharmacologists have simply to review and validate the MedDRA terms proposed by MagiCoder, instead of choosing the right terms among the 70K terms of MedDRA. Such improvement in the efficiency of pharmacologists' work has a relevant impact also on the quality of the following data analysis. Our proposal is based on a general approach, not depending on the considered language. Indeed, we developed MagiCoder for the Italian pharmacovigilance language, but preliminarily analyses show that it is robust to language and dictionary changes

    Social Media Analytics of Smoking Cessation Intervention: User Behavior Analysis, Classification, and Prediction

    Get PDF
    Tobacco use causes a large number of diseases and deaths in the United States. Traditional intervention programs are based on face-to-face consulting, and social support is offered to help smoking quitters control stress and achieve better intervention outcomes. However, the scalability of these traditional intervention programs is limited by time and location. With the development of Web 2.0, many intervention programs of smoking cessation are developed online to reach a wider population. QuitNet is a popular website for smoking cessation that provides different services to help users quit smoking. It builds communities on different social media for people to discuss issues of smoking cessation and provide social support for each other. In this dissertation, we develop a comprehensive study to understand user behavior and their discussion interactions in online communities of smoking cessation. We compare user features and behaviors on different social media channels, analyze user interactions from the perspective of social support exchange, and apply data mining techniques to analyze discussion content and recommend threads for users. Health communities are developed on different types of social media. For example, QuitNet has Web forums on its own Web site while it also has its appearance on Facebook. The user participation may vary on different social media platforms. Users may also behave differently depending on the functions and design of the social media platforms. So, as the first step in this dissertation, we carry out a preliminary study to compare smoking cessation communities on different social media channels. We analyze user characteristics and behaviors in QuitNet Forum and QuitNet Facebook with statistical analysis and social network analysis. It is found that most users of QuitNet Forum are early smoking quitters, and they participate in discussions more actively than users of QuitNet Facebook. However, users of QuitNet Facebook have a wider spectrum of quitting statuses and interaction behaviors. Second, we are interested in user behaviors and how they exchange social support in online communities. Social support is "an exchange of resources between two individuals perceived by the provider or the recipient to be intended to enhance the well-being of the recipient". As QuitNet Forum attracts much more active users than QuitNet Facebook, it provides a better platform for our research purpose. So, we focus on QuitNet Forum, developing a classification scheme through qualitative analysis to categorize discussion topics and types of social support on the forum. Patterns of user behaviors are defined and identified. Social networks are built to analyze user interactions of social support exchange. It is found that users at different quit stages have different behaviors to exchange social support, and different types of social support flow between users at different quit stages. Discussion topics, user behaviors and patterns of social support exchanges are thoroughly analyzed. However, due to a huge amount of information on QuitNet Forum, it is difficult for users to find proper topics or peers to discuss or interact with. It would be helpful if we could apply machine learning techniques to understand user generated information in online health communities, and recommend discussion topics to users to participate in. We develop classifiers to categorize posts and comments on QuitNet Forum in terms of user intentions and social support types. User behaviors and patterns are used to help developing various feature sets. Then, we develop recommendation techniques to recommend threads for users to participate in. Based on traditional Collaborative Filtering and content-based approaches, we integrate classification results and user quit stages to develop recommendation systems. The experiments show that integrating classification results or user health statuses can achieve the best recommendation results with different percentages of unknown data. In this dissertation, we implement all-sided studies for online smoking cessation communities, including comprehensive analytics and applications. The proposed frameworks and approaches could be applied to other health communities. In the future, we will apply more analytics and techniques to a larger data set, and develop user-end applications to serve and improve online health intervention programs and communities.Ph.D., Computer Science -- Drexel University, 201
    corecore