3,066 research outputs found

    Machine Learning Methods To Identify Hidden Phenotypes In The Electronic Health Record

    Get PDF
    The widespread adoption of Electronic Health Records (EHRs) means an unprecedented amount of patient treatment and outcome data is available to researchers. Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. In this dissertation, we develop new machine learning methods and computational workflows to extract hidden phenotypes from the Electronic Health Record (EHR). In Part 1, we use a semi-supervised deep learning approach to compensate for the low number of research quality labels present in the EHR. In Part 2, we examine and provide recommendations for characterizing and managing the large amount of missing data inherent to EHR data. In Part 3, we present an adversarial approach to generate synthetic data that closely resembles the original data while protecting subject privacy. We also introduce a workflow to enable reproducible research even when data cannot be shared. In Part 4, we introduce a novel strategy to first extract sequential data from the EHR and then demonstrate the ability to model these sequences with deep learning

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Clinical Network for Big Data and Personalized Health: Study Protocol and Preliminary Results

    Get PDF
    The use of secondary hospital-based clinical data and electronical health records (EHR) represent a cost-efficient alternative to investigate chronic conditions. We present the Clinical Network Big Data and Personalised Health project, which collects EHRs for patients accessing hospitals in Central-Southern Italy, through an integrated digital platform to create a digital hub for the collection, management and analysis of personal, clinical and environmental information for patients, associated with a biobank to perform multi-omic analyses. A total of 12,864 participants (61.7% women, mean age 52.6 ± 17.6 years) signed a written informed consent to allow access to their EHRs. The majority of hospital access was in obstetrics and gynaecology (36.3%), while the main reason for hospitalization was represented by diseases of the circulatory system (21.2%). Participants had a secondary education (63.5%), were mostly retired (25.45%), reported low levels of physical activity (59.6%), had low adherence to the Mediterranean diet and were smokers (30.2%). A large percentage (35.8%) were overweight and the prevalence of hypertension, diabetes and hyperlipidemia was 36.4%, 11.1% and 19.6%, respectively. Blood samples were retrieved for 8686 patients (67.5%). This project is aimed at creating a digital hub for the collection, management and analysis of personal, clinical, diagnostic and environmental information for patients, and is associated with a biobank to perform multi-omic analyses

    ASSESSMENT OF RISK SCORES FOR THE PREDICTION AND DETECTION OF TYPE 2 DIABETES MELLITUS IN CLINICAL SETTINGS

    Full text link
    Health and sociological indicators confirm that life expectancy is increasing, and so, the years that patients have to live with chronic diseases and co-morbidities. Type 2 Diabetes is one of the most common chronic diseases, specially linked to overweight and ages over sixty. As a metabolic disease, Type 2 Diabetes affects multiple organs by causing damage in blood vessels and nervous system at micro and macro scale. Mortality of subjects with diabetes is three times higher than the mortality for subjects with other chronic diseases. On the one hand, the management of diabetes is focused on the maintenance of the blood glucose levels under a threshold by the prescription of anti-diabetic drugs and a combination of healthy food habits and moderate physical activity. Recent studies have demonstrated the effectiveness of new strategies to delay and even prevent the onset of Type 2 Diabetes by a combination of active and healthy lifestyle on cohorts of mid to high risk subjects. On the other hand, prospective research has been driven on large groups of population to build risk scores which aim to obtain a rule for the classification of patients according to the odds for developing the disease. Currently there are more than two hundred models and risk scores for doing this, but a few have been properly evaluated in external groups and, to date, none of them has been tested on a population based study. The research study presented in this doctoral thesis strives to use externally validated risk scores for the prediction and detection of Type 2 Diabetes on a population data base in Hospital La Fe (Valencia, Spain). The study hypothesis is that the integration of existing prediction and detection risk scores on Electronic Health Records increases the early-detection of high risk cases. To evaluate this hypothesis three studies on the clinical, user and technology dimensions have been driven to evaluate the extent to which the models and the hospital is ready to exploit such models to identify high risk groups and drive efficient preventive strategies. The findings presented in this thesis suggest that Electronic Health Records are not prepared to massively feed risk models. Some of the evaluated models have shown a good classification performance, which accompanied to the well-acceptance of web-based tools and the acceptable technical performance of the information and communication technology system, suggests that after some work these models can effectively drive a new paradigm of active screening for Type 2 Diabetes.Los indicadores de salud y sociológicos confirman que la esperanza de vida está aumentando, y por lo tanto, los años que los pacientes tienen que vivir con enfermedades crónicas y comorbilidades. Diabetes tipo 2 es una de las enfermedades crónicas más comunes, especialmente relacionadas con el sobrepeso y edades superiores a los sesenta años. Como enfermedad metabólica, la diabetes tipo 2 afecta a múltiples órganos causando daño en los vasos sanguíneos y el sistema nervioso a escala micro y macro. La mortalidad de sujetos con diabetes es tres veces mayor que la mortalidad de sujetos con otras enfermedades crónicas. Por un lado, la estrategia de manejo se centra en el mantenimiento de los niveles de glucosa en sangre bajo un umbral mediante la prescripción de fármacos antidiabéticos y una combinación de hábitos alimentarios saludables y actividad física moderada. Estudios recientes han demostrado la eficacia de nuevas estrategias para retrasar e incluso prevenir la aparición de la diabetes tipo 2 mediante una combinación de estilo de vida activo y saludable en cohortes de sujetos de riesgo medio a alto. Por otro lado, la investigación prospectiva se ha dirigido a grupos de la población para construir modelos de riesgo que pretenden obtener una regla para la clasificación de las personas según las probabilidades de desarrollar la enfermedad. Actualmente hay más de doscientos modelos de riesgo para hacer esta identificación, no obstante la inmensa mayoría no han sido debidamente evaluados en grupos externos y, hasta la fecha, ninguno de ellos ha sido probado en un estudio poblacional. El estudio de investigación presentado en esta tesis doctoral pretende utilizar modelos riesgo validados externamente para la predicción y detección de la Diabetes Tipo 2 en una base de datos poblacional del Hospital La Fe de Valencia (España). La hipótesis del estudio es que la integración de los modelos de riesgo de predicción y detección existentes la práctica clínica aumenta la detección temprana de casos de alto riesgo. Para evaluar esta hipótesis, se han realizado tres estudios sobre las dimensiones clínicas, del usuario y de la tecnología para evaluar hasta qué punto los modelos y el hospital están dispuestos a explotar dichos modelos para identificar grupos de alto riesgo y conducir estrategias preventivas eficaces. Los hallazgos presentados en esta tesis sugieren que los registros de salud electrónicos no están preparados para alimentar masivamente modelos de riesgo. Algunos de los modelos evaluados han demostrado un buen desempeño de clasificación, lo que acompañó a la buena aceptación de herramientas basadas en la web y el desempeño técnico aceptable del sistema de tecnología de información y comunicación, sugiere que después de algún trabajo estos modelos pueden conducir un nuevo paradigma de la detección activa de la Diabetes Tipo 2.Els indicadors sociològics i de salut confirmen un augment en l'esperança de vida, i per tant, dels anys que les persones han de viure amb malalties cròniques i comorbiditats. la diabetis de tipus 2 és una de les malalties cròniques més comunes, especialment relacionades amb l'excés de pes i edats superiors als seixanta anys. Com a malaltia metabòlica, la diabetis de tipus 2 afecta múltiples òrgans causant dany als vasos sanguinis i el sistema nerviós a escala micro i macro. La mortalitat de subjectes amb diabetis és tres vegades superior a la mortalitat de subjectes amb altres malalties cròniques. D'una banda, l'estratègia de maneig se centra en el manteniment dels nivells de glucosa en sang sota un llindar mitjançant la prescripció de fàrmacs antidiabètics i una combinació d'hàbits alimentaris saludables i activitat física moderada. Estudis recents han demostrat l'eficàcia de noves estratègies per a retardar i fins i tot prevenir l'aparició de la diabetis de tipus 2 mitjançant una combinació d'estil de vida actiu i saludable en cohorts de subjectes de risc mitjà a alt. D'altra banda, la investigació prospectiva s'ha dirigit a grups específics de la població per construir models de risc que pretenen obtenir una regla per a la classificació de les persones segons les probabilitats de desenvolupar la malaltia. Actualment hi ha més de dos-cents models de risc per fer aquesta identificació, però la immensa majoria no han estat degudament avaluats en grups externs i, fins ara, cap d'ells ha estat provat en un estudi poblacional. L'estudi d'investigació presentat en aquesta tesi doctoral utilitza models de risc validats externament per a la predicció i detecció de diabetis de tipus 2 en una base de dades poblacional de l'Hospital La Fe de València (Espanya). La hipòtesi de l'estudi és que la integració dels models de risc de predicció i detecció existents la pràctica clínica augmenta la detecció de casos d'alt risc. Per avaluar aquesta hipòtesi, s'han realitzat tres estudis sobre les dimensions clíniques, de l'usuari i de la tecnologia per avaluar fins a quin punt els models i l'hospital estan disposats a explotar aquests models per identificar grups d'alt risc i conduir estratègies preventives. Les troballes presentades sugereixen que els registres de salut electrònics no estan preparats per alimentar massivament models de risc. Alguns dels models avaluats han demostrat una bona classificació, el que va acompanyar a la bona acceptació d'eines basades en el web i el rendiment tècnic acceptable del sistema de tecnologia d'informació i comunicacions implementat. La conclusió es que encara es necesari treball per que aquests models poden conduir un nou paradigma de la detecció activa de la diabetis de tipus 2.Martínez Millana, A. (2017). ASSESSMENT OF RISK SCORES FOR THE PREDICTION AND DETECTION OF TYPE 2 DIABETES MELLITUS IN CLINICAL SETTINGS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86209TESI

    Integration of Distributed Services and Hybrid Models Based on Process Choreography to Predict and Detect Type 2 Diabetes

    Full text link
    [EN] Life expectancy is increasing and, so, the years that patients have to live with chronic diseases and co-morbidities. Type 2 diabetes is one of the most prevalent chronic diseases, specifically linked to being overweight and ages over sixty. Recent studies have demonstrated the effectiveness of new strategies to delay and even prevent the onset of type 2 diabetes by a combination of active and healthy lifestyle on cohorts of mid to high risk subjects. Prospective research has been driven on large groups of the population to build risk scores that aim to obtain a rule for the classification of patients according to the odds for developing the disease. Currently, there are more than two hundred models and risk scores for doing this, but a few have been properly evaluated in external groups and integrated into a clinical application for decision support. In this paper, we present a novel system architecture based on service choreography and hybrid modeling, which enables a distributed integration of clinical databases, statistical and mathematical engines and web interfaces to be deployed in a clinical setting. The system was assessed during an eight-week continuous period with eight endocrinologists of a hospital who evaluated up to 8080 patients with seven different type 2 diabetes risk models implemented in two mathematical engines. Throughput was assessed as a matter of technical key performance indicators, confirming the reliability and efficiency of the proposed architecture to integrate hybrid artificial intelligence tools into daily clinical routine to identify high risk subjects.The authors wish to acknowledge the consortium of the MOSAIC project (funded by the European Commission, Grant No. FP7-ICT 600914) for their commitment during concept development, which led to the development of the research reported in this manuscriptMartinez-Millana, A.; Bayo-Monton, JL.; Argente-Pla, M.; Fernández Llatas, C.; Merino-Torres, JF.; Traver Salcedo, V. (2018). Integration of Distributed Services and Hybrid Models Based on Process Choreography to Predict and Detect Type 2 Diabetes. Sensors. 18 (1)(79):1-26. https://doi.org/10.3390/s18010079S12618 (1)79Thomas, C. C., & Philipson, L. H. (2015). Update on Diabetes Classification. Medical Clinics of North America, 99(1), 1-16. doi:10.1016/j.mcna.2014.08.015Kahn, S. E., Hull, R. L., & Utzschneider, K. M. (2006). Mechanisms linking obesity to insulin resistance and type 2 diabetes. Nature, 444(7121), 840-846. doi:10.1038/nature05482Guariguata, L., Whiting, D. R., Hambleton, I., Beagley, J., Linnenkamp, U., & Shaw, J. E. (2014). Global estimates of diabetes prevalence for 2013 and projections for 2035. Diabetes Research and Clinical Practice, 103(2), 137-149. doi:10.1016/j.diabres.2013.11.002Beagley, J., Guariguata, L., Weil, C., & Motala, A. A. (2014). Global estimates of undiagnosed diabetes in adults. Diabetes Research and Clinical Practice, 103(2), 150-160. doi:10.1016/j.diabres.2013.11.001Hippisley-Cox, J., Coupland, C., Robson, J., Sheikh, A., & Brindle, P. (2009). Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore. BMJ, 338(mar17 2), b880-b880. doi:10.1136/bmj.b880Meigs, J. B., Shrader, P., Sullivan, L. M., McAteer, J. B., Fox, C. S., Dupuis, J., … Cupples, L. A. (2008). Genotype Score in Addition to Common Risk Factors for Prediction of Type 2 Diabetes. New England Journal of Medicine, 359(21), 2208-2219. doi:10.1056/nejmoa0804742Gillies, C. L., Abrams, K. R., Lambert, P. C., Cooper, N. J., Sutton, A. J., Hsu, R. T., & Khunti, K. (2007). Pharmacological and lifestyle interventions to prevent or delay type 2 diabetes in people with impaired glucose tolerance: systematic review and meta-analysis. BMJ, 334(7588), 299. doi:10.1136/bmj.39063.689375.55Noble, D., Mathur, R., Dent, T., Meads, C., & Greenhalgh, T. (2011). Risk models and scores for type 2 diabetes: systematic review. BMJ, 343(nov28 1), d7163-d7163. doi:10.1136/bmj.d7163Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Annals of Internal Medicine, 162(1), 55. doi:10.7326/m14-0697Steyerberg, E. W., Moons, K. G. M., van der Windt, D. A., Hayden, J. A., Perel, P., … Schroter, S. (2013). Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. PLoS Medicine, 10(2), e1001381. doi:10.1371/journal.pmed.1001381Collins, G. S., & Moons, K. G. M. (2012). Comparing risk prediction models. BMJ, 344(may24 2), e3186-e3186. doi:10.1136/bmj.e3186Riley, R. D., Ensor, J., Snell, K. I. E., Debray, T. P. A., Altman, D. G., Moons, K. G. M., & Collins, G. S. (2016). External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ, i3140. doi:10.1136/bmj.i3140Reilly, B. M., & Evans, A. T. (2006). Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions. Annals of Internal Medicine, 144(3), 201. doi:10.7326/0003-4819-144-3-200602070-00009Altman, D. G., Vergouwe, Y., Royston, P., & Moons, K. G. M. (2009). Prognosis and prognostic research: validating a prognostic model. BMJ, 338(may28 1), b605-b605. doi:10.1136/bmj.b605Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E., & Altman, D. G. (2009). Prognosis and prognostic research: what, why, and how? BMJ, 338(feb23 1), b375-b375. doi:10.1136/bmj.b375Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., … Kattan, M. W. (2010). Assessing the Performance of Prediction Models. Epidemiology, 21(1), 128-138. doi:10.1097/ede.0b013e3181c30fb2Kayacan, E., Ulutas, B., & Kaynak, O. (2010). Grey system theory-based models in time series prediction. Expert Systems with Applications, 37(2), 1784-1789. doi:10.1016/j.eswa.2009.07.064Schmidt, M. I., Duncan, B. B., Bang, H., Pankow, J. S., Ballantyne, C. M., … Golden, S. H. (2005). Identifying Individuals at High Risk for Diabetes: The Atherosclerosis Risk in Communities study. Diabetes Care, 28(8), 2013-2018. doi:10.2337/diacare.28.8.2013Talmud, P. J., Hingorani, A. D., Cooper, J. A., Marmot, M. G., Brunner, E. J., Kumari, M., … Humphries, S. E. (2010). Utility of genetic and non-genetic risk factors in prediction of type 2 diabetes: Whitehall II prospective cohort study. BMJ, 340(jan14 1), b4838-b4838. doi:10.1136/bmj.b4838Sackett, D. L. (1997). Evidence-based medicine. Seminars in Perinatology, 21(1), 3-5. doi:10.1016/s0146-0005(97)80013-4Segagni, D., Ferrazzi, F., Larizza, C., Tibollo, V., Napolitano, C., Priori, S. G., & Bellazzi, R. (2011). R Engine Cell: integrating R into the i2b2 software infrastructure. Journal of the American Medical Informatics Association, 18(3), 314-317. doi:10.1136/jamia.2010.007914Semantic Webhttp://www.w3.org/2001/sw/Murphy, S. N., Weber, G., Mendis, M., Gainer, V., Chueh, H. C., Churchill, S., & Kohane, I. (2010). Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association, 17(2), 124-130. doi:10.1136/jamia.2009.000893Murphy, S., Churchill, S., Bry, L., Chueh, H., Weiss, S., Lazarus, R., … Kohane, I. (2009). Instrumenting the health care enterprise for discovery research in the genomic era. Genome Research, 19(9), 1675-1681. doi:10.1101/gr.094615.109Lindstrom, J., & Tuomilehto, J. (2003). The Diabetes Risk Score: A practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3), 725-731. doi:10.2337/diacare.26.3.725Alssema, M., Vistisen, D., Heymans, M. W., Nijpels, G., Glümer, C., … Dekker, J. M. (2010). The Evaluation of Screening and Early Detection Strategies for Type 2 Diabetes and Impaired Glucose Tolerance (DETECT-2) update of the Finnish diabetes risk score for prediction of incident type 2 diabetes. Diabetologia, 54(5), 1004-1012. doi:10.1007/s00125-010-1990-7Mann, D. M., Bertoni, A. G., Shimbo, D., Carnethon, M. R., Chen, H., Jenny, N. S., & Muntner, P. (2010). Comparative Validity of 3 Diabetes Mellitus Risk Prediction Scoring Models in a Multiethnic US Cohort: The Multi-Ethnic Study of Atherosclerosis. American Journal of Epidemiology, 171(9), 980-988. doi:10.1093/aje/kwq030Stern, M. P., Williams, K., & Haffner, S. M. (2002). Identification of Persons at High Risk for Type 2 Diabetes Mellitus: Do We Need the Oral Glucose Tolerance Test? Annals of Internal Medicine, 136(8), 575. doi:10.7326/0003-4819-136-8-200204160-00006Abdul-Ghani, M. A., Abdul-Ghani, T., Stern, M. P., Karavic, J., Tuomi, T., Bo, I., … Groop, L. (2011). Two-Step Approach for the Prediction of Future Type 2 Diabetes Risk. Diabetes Care, 34(9), 2108-2112. doi:10.2337/dc10-2201Rahman, M., Simmons, R. K., Harding, A.-H., Wareham, N. J., & Griffin, S. J. (2008). A simple risk score identifies individuals at high risk of developing Type 2 diabetes: a prospective cohort study. Family Practice, 25(3), 191-196. doi:10.1093/fampra/cmn024Guasch-Ferré, M., Bulló, M., Costa, B., Martínez-Gonzalez, M. Á., Ibarrola-Jurado, N., … Estruch, R. (2012). A Risk Score to Predict Type 2 Diabetes Mellitus in an Elderly Spanish Mediterranean Population at High Cardiovascular Risk. PLoS ONE, 7(3), e33437. doi:10.1371/journal.pone.0033437Wilson, P. W. F. (2007). Prediction of Incident Diabetes Mellitus in Middle-aged Adults. Archives of Internal Medicine, 167(10), 1068. doi:10.1001/archinte.167.10.1068Franzin, A., Sambo, F., & Di Camillo, B. (2016). bnstruct: an R package for Bayesian Network structure learning in the presence of missing data. Bioinformatics, btw807. doi:10.1093/bioinformatics/btw807Rood, B., & Lewis, M. J. (2009). Grid Resource Availability Prediction-Based Scheduling and Task Replication. Journal of Grid Computing, 7(4), 479-500. doi:10.1007/s10723-009-9135-2Ramakrishnan, L., & Reed, D. A. (2009). Predictable quality of service atop degradable distributed systems. Cluster Computing, 16(2), 321-334. doi:10.1007/s10586-009-0078-yKianpisheh, S., Kargahi, M., & Charkari, N. M. (2017). Resource Availability Prediction in Distributed Systems: An Approach for Modeling Non-Stationary Transition Probabilities. IEEE Transactions on Parallel and Distributed Systems, 28(8), 2357-2372. doi:10.1109/tpds.2017.2659746Weber, G. M., Murphy, S. N., McMurry, A. J., MacFadden, D., Nigrin, D. J., Churchill, S., & Kohane, I. S. (2009). The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data Repositories. Journal of the American Medical Informatics Association, 16(5), 624-630. doi:10.1197/jamia.m3191Martinez-Millana, A., Fico, G., Fernández-Llatas, C., & Traver, V. (2015). Performance assessment of a closed-loop system for diabetes management. Medical & Biological Engineering & Computing, 53(12), 1295-1303. doi:10.1007/s11517-015-1245-3Foundation for Intelligent Physical Agentshttp://www.pa.org/González-Vélez, H., Mier, M., Julià-Sapé, M., Arvanitis, T. N., García-Gómez, J. M., Robles, M., … Lluch-Ariet, M. (2007). HealthAgents: distributed multi-agent brain tumor diagnosis and prognosis. Applied Intelligence, 30(3), 191-202. doi:10.1007/s10489-007-0085-8Bellazzi, R. (2014). Big Data and Biomedical Informatics: A Challenging Opportunity. Yearbook of Medical Informatics, 23(01), 08-13. doi:10.15265/iy-2014-0024Maximilien, E. M., & Singh, M. P. (2004). A framework and ontology for dynamic Web services selection. IEEE Internet Computing, 8(5), 84-93. doi:10.1109/mic.2004.2

    Artificial Intelligence in Healthcare: Transitioning to Routine Clinical Care

    Get PDF

    Risk assessment for progression of Diabetic Nephropathy based on patient history analysis

    Get PDF
    A nefropatia diabética (ND) é uma das complicações mais comuns em doentes com diabetes. Trata-se de uma doença crónica que afeta progressivamente os rins, podendo resultar numa insuficiência renal. A digitalização permitiu aos hospitais armazenar as informações dos doentes em registos de saúde eletrónicos (RSE). A aplicação de algoritmos de Machine Learning (ML) a estes dados pode permitir a previsão do risco na evolução destes doentes, conduzindo a uma melhor gestão da doença. O principal objetivo deste trabalho é criar um modelo preditivo que tire partido do historial do doente presente nos RSE. Foi aplicado neste trabalho o maior conjunto de dados de doentes portugueses com DN, seguidos durante 22 anos pela Associação Protetora dos Diabéticos de Portugal (APDP). Foi desenvolvida uma abordagem longitudinal na fase de pré-processamento de dados, permitindo que estes fossem servidos como entrada para dezasseis algoritmos de ML distintos. Após a avaliação e análise dos respetivos resultados, o Light Gradient Boosting Machine foi identificado como o melhor modelo, apresentando boas capacidades de previsão. Esta conclusão foi apoiada não só pela avaliação de várias métricas de classificação em dados de treino, teste e validação, mas também pela avaliação do seu desempenho por cada estádio da doença. Para além disso, os modelos foram analisados utilizando gráficos de feature ranking e através de análise estatística. Como complemento, são ainda apresentados a interpretabilidade dos resultados através do método SHAP, assim como a distribuição do modelo utilizando o Gradio e os servidores da Hugging Face. Através da integração de técnicas ML, de um método de interpretação e de uma aplicação Web que fornece acesso ao modelo, este estudo oferece uma abordagem potencialmente eficaz para antecipar a evolução da ND, permitindo que os profissionais de saúde tomem decisões informadas para a prestação de cuidados personalizados e gestão da doença
    corecore