5 research outputs found

    Analysis of Research in Healthcare Data Analytics

    Get PDF
    The main aim of this paper is to provide a deep analysis on the research field of healthcare data analytics., as well as highlighting some of guidelines and gaps in previous studies. This study has focused on searching relevant papers about healthcare analytics by searching in seven popular databases such as google scholar and springer using specific keywords, in order to understand the healthcare topic and conduct our literature review. The paper has listed some data analytics tools and techniques that have been used to improve healthcare performance in many areas such as: medical operations, reports, decision making, and prediction and prevention system. Moreover, the systematic review has showed an interesting demographic of fields of publication, research approaches, as well as outlined some of the possible reasons and issues associated with healthcare data analytics, based on geographical distribution theme

    Semi-supervised incremental learning with few examples for discovering medical association rules

    Get PDF
    Background: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.This work has been partially supported by projects DOTT-HEALTH (PID2019-106942RB-C32, MCI/AEI/FEDER, UE). (Design of the study. Analysis and interpretation of data) and EXTRAE II (IMIENS 2019). (Design of the study. Analysis and interpretation of data. HUF corpus manual tagging. Writing of the manuscript), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) – SmartPITeS” (Data collection and HUF corpus construction), and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the genaration of info-banks and their secondary use in research: technological solution) – CAMAMA 4” (Data collection and HUF corpus construction) from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i.S

    Discovering HIV related information by means of association rules and machine learning

    Get PDF
    Acquired immunodeficiency syndrome (AIDS) is still one of the main health problems worldwide. It is therefore essential to keep making progress in improving the prognosis and quality of life of affected patients. One way to advance along this pathway is to uncover connections between other disorders associated with HIV/AIDS-so that they can be anticipated and possibly mitigated. We propose to achieve this by using Association Rules (ARs). They allow us to represent the dependencies between a number of diseases and other specific diseases. However, classical techniques systematically generate every AR meeting some minimal conditions on data frequency, hence generating a vast amount of uninteresting ARs, which need to be filtered out. The lack of manually annotated ARs has favored unsupervised filtering, even though they produce limited results. In this paper, we propose a semi-supervised system, able to identify relevant ARs among HIV-related diseases with a minimal amount of annotated training data. Our system has been able to extract a good number of relationships between HIV-related diseases that have been previously detected in the literature but are scattered and are often little known. Furthermore, a number of plausible new relationships have shown up which deserve further investigation by qualified medical experts.This study has been partially supported by the Spanish Ministry of Science and Innovation within the DOTTHEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, the OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and UE (“NextGenerationEU”/PRTR)) under Grant TED2021-130398B-C21 and the project RAICES (IMIENS 2022), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) - SmartPITeS” and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the generation of info-banks and their secondary use in research: technological solution) - CAMAMA 4” from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i. The RIS cohort (CoRIS) is supported by the Instituto de Salud Carlos III through the Red Temática de Investigación Cooperativa en Sida (RD06/006, RD12/0017/0018 and RD16/0002/0006) as part of the Plan Nacional R+D+I and co-financed by ISCIII-Subdirección General de Evaluación and el Fondo Europeo de Desarrollo Regional (FEDER). The list of members of the Cohort of the Spanish HIV Research Network (CoRIS) is included in the Supplementary Material. Additional relationships between HIV-related diseases confirmed or discarded are included as Supplementary Material. This study would not have been possible without the collaboration of all patients, medical and nursing staff and data mangers who have taken part in the Project.S
    corecore