39,768 research outputs found

    Approximate Data Mining Techniques on Clinical Data

    Get PDF
    The past two decades have witnessed an explosion in the number of medical and healthcare datasets available to researchers and healthcare professionals. Data collection efforts are highly required, and this prompts the development of appropriate data mining techniques and tools that can automatically extract relevant information from data. Consequently, they provide insights into various clinical behaviors or processes captured by the data. Since these tools should support decision-making activities of medical experts, all the extracted information must be represented in a human-friendly way, that is, in a concise and easy-to-understand form. To this purpose, here we propose a new framework that collects different new mining techniques and tools proposed. These techniques mainly focus on two aspects: the temporal one and the predictive one. All of these techniques were then applied to clinical data and, in particular, ICU data from MIMIC III database. It showed the flexibility of the framework, which is able to retrieve different outcomes from the overall dataset. The first two techniques rely on the concept of Approximate Temporal Functional Dependencies (ATFDs). ATFDs have been proposed, with their suitable treatment of temporal information, as a methodological tool for mining clinical data. An example of the knowledge derivable through dependencies may be "within 15 days, patients with the same diagnosis and the same therapy usually receive the same daily amount of drug". However, current ATFD models are not analyzing the temporal evolution of the data, such as "For most patients with the same diagnosis, the same drug is prescribed after the same symptom". To this extent, we propose a new kind of ATFD called Approximate Pure Temporally Evolving Functional Dependencies (APEFDs). Another limitation of such kind of dependencies is that they cannot deal with quantitative data when some tolerance can be allowed for numerical values. In particular, this limitation arises in clinical data warehouses, where analysis and mining have to consider one or more measures related to quantitative data (such as lab test results and vital signs), concerning multiple dimensional (alphanumeric) attributes (such as patient, hospital, physician, diagnosis) and some time dimensions (such as the day since hospitalization and the calendar date). According to this scenario, we introduce a new kind of ATFD, named Multi-Approximate Temporal Functional Dependency (MATFD), which considers dependencies between dimensions and quantitative measures from temporal clinical data. These new dependencies may provide new knowledge as "within 15 days, patients with the same diagnosis and the same therapy receive a daily amount of drug within a fixed range". The other techniques are based on pattern mining, which has also been proposed as a methodological tool for mining clinical data. However, many methods proposed so far focus on mining of temporal rules which describe relationships between data sequences or instantaneous events, without considering the presence of more complex temporal patterns into the dataset. These patterns, such as trends of a particular vital sign, are often very relevant for clinicians. Moreover, it is really interesting to discover if some sort of event, such as a drug administration, is capable of changing these trends and how. To this extent, we propose a new kind of temporal patterns, called Trend-Event Patterns (TEPs), that focuses on events and their influence on trends that can be retrieved from some measures, such as vital signs. With TEPs we can express concepts such as "The administration of paracetamol on a patient with an increasing temperature leads to a decreasing trend in temperature after such administration occurs". We also decided to analyze another interesting pattern mining technique that includes prediction. This technique discovers a compact set of patterns that aim to describe the condition (or class) of interest. Our framework relies on a classification model that considers and combines various predictive pattern candidates and selects only those that are important to improve the overall class prediction performance. We show that our classification approach achieves a significant reduction in the number of extracted patterns, compared to the state-of-the-art methods based on minimum predictive pattern mining approach, while preserving the overall classification accuracy of the model. For each technique described above, we developed a tool to retrieve its kind of rule. All the results are obtained by pre-processing and mining clinical data and, as mentioned before, in particular ICU data from MIMIC III database

    Supporting Governance in Healthcare Through Process Mining: A Case Study

    Get PDF
    Healthcare organizations are under increasing pressure to improve productivity, gain competitive advantage and reduce costs. In many cases, despite management already gained some kind of qualitative intuition about inefciencies and possible bottlenecks related to the enactment of patients' careows, it does not have the right tools to extract knowledge from available data and make decisions based on a quantitative analysis. To tackle this issue, starting from a real case study conducted in San Carlo di Nancy hospital in Rome (Italy), this article presents the results of a process mining project in the healthcare domain. Process mining techniques are here used to infer meaningful knowledge about the patient careflows from raw event logs consisting of clinical data stored by the hospital information systems. These event logs are analyzed using the ProM framework from three different perspectives: the control flow perspective, the organizational perspective and the performance perspective. The results on the proposed case study show that process mining provided useful insights for the governance of the hospital. In particular, we were able to provide answers to the management of the hospital concerning the value of last investments, and the temporal distribution of abandonments from emergency room and exams without reservation

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Mining unexpected temporal associations: Applications in detecting adverse drug reactions

    Get PDF
    Copyright © 2008 IEEEIn various real-world applications, it is very useful mining unanticipated episodes where certain event patterns unexpectedly lead to outcomes, e.g., taking two medicines together sometimes causing an adverse reaction. These unanticipated episodes are usually unexpected and infrequent, which makes existing data mining techniques, mainly designed to find frequent patterns, ineffective. In this paper, we propose unexpected temporal association rules (UTARs) to describe them. To handle the unexpectedness, we introduce a new interestingness measure, residual-leverage, and develop a novel case-based exclusion technique for its calculation. Combining it with an event-oriented data preparation technique to handle the infrequency, we develop a new algorithm MUTARC to find pairwise UTARs. The MUTARC is applied to generate adverse drug reaction (ADR) signals from real-world healthcare administrative databases. It reliably shortlists not only six known ADRs, but also another ADR, flucloxacillin possibly causing hepatitis, which our algorithm designers and experiment runners have not known before the experiments. TheMUTARC performs much more effectively than existing techniques. This paper clearly illustrates the great potential along the new direction of ADR signal generation from healthcare administrative databases.Huidong (Warren) Jin, Jie Chen, Member, Hongxing He, Graham J. Williams, Chris Kelman and Christine M. O’Keef

    BCAS: A Web-enabled and GIS-based Decision Support System for the Diagnosis and Treatment of Breast Cancer

    Get PDF
    For decades, geographical variations in cancer rates have been observed but the precise determinants of such geographic differences in breast cancer development are unclear. Various statistical models have been proposed. Applications of these models, however, require that the data be assembled from a variety of sources, converted into the statistical models’ parameters and delivered effectively to researchers and policy makers. A web-enabled and GIS-based system can be developed to provide the needed functionality. This article overviews the conceptual web-enabled and GIS-based system (BCAS), illustrates the system’s use in diagnosing and treating breast cancer and examines the potential benefits and implications for breast cancer research and practice

    A Decision Technology System To Advance the Diagnosis and Treatment of Breast Cancer

    Get PDF
    Geographical variations in cancer rates have been observed for decades. Described spatial patterns and trends have provided clues for generating hypotheses about the etiology of cancer. For breast cancer, investigators have demonstrated that some variation can be explained by differences in the population distribution of known breast cancer risk factors such as menstrual and reproductive variables (Laden, Spiegelman, and Neas, 1997; Robbins, Bescianini, and Kelsey, 1997; Sturgeon, Schairer, and Gail, 1995). However, regional patterns also may reflect the effects of Workshop on Hormones, Hormone Metabolism, Environment, and Breast Cancer (1995): (a) environmental hazards (such as air and water pollution), (b) demographics and the lifestyle of a mobile population, (c) subgroup susceptibility, (d) changes and advances in medical practice and healthcare management, and (e) other factors. To accurately measure breast cancer risk in individuals and population groups, it is necessary to singly and jointly assess the association between such risk and the hypothesized factors. Various statistical models will be needed to determine the potential relationships between breast cancer development and estimated exposures to environmental contamination. To apply the models, data must be assembled from a variety of sources, converted into the statistical models’ parameters, and delivered effectively to researchers and policy makers. A Web-enabled decision technology system can be developed to provide the needed functionality. This chapter will present a conceptual architecture for such a decision technology system. First, there will be a brief overview of a typical geographical analysis. Next, the chapter will present the conceptual Web-based decision technology system and illustrate how the system can assist users in diagnosing and treating breast cancer. The chapter will conclude with an examination of the potential benefits from system use and the implications for breast cancer research and practice

    Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

    Full text link
    Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the temporality and high dimensionality of sequential EHR data and to interpret the prediction results. Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results. However, RNN-based approaches suffer from the problem that the performance of RNNs drops when the length of sequences is large, and the relationships between subsequent visits are ignored by current RNN-based approaches. To address these issues, we propose {\sf Dipole}, an end-to-end, simple and robust model for predicting patients' future health information. Dipole employs bidirectional recurrent neural networks to remember all the information of both the past visits and the future visits, and it introduces three attention mechanisms to measure the relationships of different visits for the prediction. With the attention mechanisms, Dipole can interpret the prediction results effectively. Dipole also allows us to interpret the learned medical code representations which are confirmed positively by medical experts. Experimental results on two real world EHR datasets show that the proposed Dipole can significantly improve the prediction accuracy compared with the state-of-the-art diagnosis prediction approaches and provide clinically meaningful interpretation
    • …
    corecore