    Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review

    Background: There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been little research into how to make the most of SBDH information from EHRs. Methods: A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided. Results: Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, and several NLP approaches for extracting SDOH from clinical literature. Discussion: The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using Natural Language Processing (NLP) technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues. Conclusion: Despite known associations between SBDH and disease, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, and ultimately promoting health and health equity. Keywords: Social and Behavioral Determinants of Health, Artificial Intelligence, Electronic Health Records, Natural Language Processing, Predictive ModelComment: 32 pages, 5 figure

    Machine learning approaches to identifying social determinants of health in electronic health record clinical notes

    Social determinants of health (SDH) represent the complex set of circumstances in which individuals are born, or with which they live, that impact health. Relatively little attention has been given to processes needed to extract SDH data from electronic health records. Despite their importance, SDH data in the EHR remains sparse, typically collected only in clinical notes and thus largely unavailable for clinical decision making. I focus on developing and validating more efficient information extraction approaches to identifying and classifying SDH in clinical notes. In this dissertation, I have three goals: First, I develop a word embedding model to expand SDH terminology in the context of identifying SDH clinical text. Second, I examine the effectiveness of different machine learning algorithms and a neural network model to classify the SDH characteristics financial resource strain and poor social support. Third, I compare the highest performing approaches to simpler text mining techniques and evaluate the models based on performance, cost, and generalizability in the task of classifying SDH in two distinct data sources.Doctor of Philosoph

    Heart failure – risk factors and the validity of diagnoses

    Heart failure (HF) is a global health problem. HF risk factors remain understudied. The roles that diabetes and sodium consumption play in HF remain unknown. Furthermore, the validity of HF diagnoses in the Finnish Hospital Discharge Register (FHDR) has not been thoroughly evaluated. This thesis aims to discover sodiumand diabetes-related HF risk factors, validate FHDR-based HF diagnoses, and investigate if the subtyping of register-based HF diagnoses could be improved through electronic health record (EHR) data mining. A 24-hour urinary sodium excretion (mean 183 mmol/d) was measured from 4,630 individuals to assess the relationship between salt intake and incident HF (Study I). We used data from 3,834 diabetic and 90,177 nondiabetic individuals to evaluate the diabetes status-related differences in risk factors and mediators of HF (Study II). Medical records of 120 HF cases and 120 controls were examined to study the validity of HF diagnoses (Study III). We drew data from 33,983 patients to assess if HF diagnoses could be subtyped more accurately through EHR data mining (Study IV) and validated the mining-based versus clinical subtyping in 100 randomly selected patients. In Study I, we observed that high sodium intake was associated with incident coronary artery disease (CAD) and diabetes, but not HF. In Study II, the risk of HF was 2.7-fold in individuals with diabetes compared to nondiabetic participants. Conventional cardiovascular disease risk factors and biomarkers for cardiac strain, myocardial injury, and inflammation were associated with incident HF in both groups. The strongest mediators of HF in diabetes were the direct effect of diabetes and the indirect effects mediated by obesity, cardiac strain/volume overload, and hyperglycemia. In studies III and IV, HF diagnoses of the FHDR had good predictive values (NPV 0.83, PPV 0.85), even when patients with preexisting heart conditions were used as controls. With additional EHR-mined data, the accuracy of our algorithm to correctly classify individuals into HF subtypes versus clinical assessment was 86 %. The findings in this thesis show that register-based HF is an accurate endpoint and that EHR data mining can improve this accuracy. Our results also elucidate the role of sodium and diabetes as HF risk factors.Sydämen vajaatoiminta: riskitekijät ja diagnoosien validiteetti Sydämen vajaatoiminta on maailmanlaajuinen terveysongelma, jonka riskitekijät ovat osin epäselviä. Suolan käytön yhteyttä ja diabeteksen aiheuttamaa korkeaa riskiä vajaatoimintaan ei ole riittävästi tutkittu. Vajaatoimintadiagnoosien validiteettia Hoitoilmoitusjärjestelmä (HILMO)-sairaalarekisterissä ei tiedetä. Tässä väitöskirjatyössä tutkittiin suolaan ja diabetekseen liittyviä sydämen vajaatoiminnan riskitekijöitä, validoitiin HILMO-pohjaiset vajaatoimintadiagnoosit ja selvitettiin, voidaanko vajaatoimintaa alatyypittää tekstinlouhintaa käyttämällä. Suolan saannin ja vajaatoiminnan välisen suhteen arvioimiseksi (tutkimus I) tutkittiin 4 630 henkilön vuorokausivirtsan natrium (keskimäärin 183 mmol/d). Diabetekseen liittyvien sydämen vajaatoiminnan riskitekijöiden selvittämiseksi (tutkimus II) käytiin läpi 3 834 diabeetikon ja 90 177 verrokin tiedot. Vajaatoimintadiagnoosien validiteettia (tutkimus III) varten tutkimme 120 vajaatoimintatapauksen ja 120 verrokin (joilla oli muu sydänsairaus) potilastiedot ja tarkempaa alatyypitystä (tutkimus IV) varten keräsimme tietoja 33 983 potilaasta ja validoimme tiedonlouhintaan perustuvan alatyypityksen 100 satunnaisella potilaalla. Tutkimuksessa I suolan saanti oli yhteydessä sepelvaltimotaudin ja diabeteksen kehittymiseen, mutta tulokset eivät olleet merkitseviä vajaatoiminnan osalta. Tutkimuksessa II diabeetikoiden vajaatoimintariski oli 2,7-kertainen verrokkeihin verrattuna. Molemmilla tavanomaiset riskitekijät ja sydämen venyvyyden, sydänvaurion ja tulehduksen merkkiaineet olivat yhteydessä vajaatoimintaan. Merkittävimmät diabeteksen vajaatoimintaa välittävät muuttujat olivat diabeteksen suora vaikutus sekä epäsuorat ylipainon, sydämen venymisen ja hyperglykemian vaikutukset. Tutkimuksissa III ja IV HILMO-rekisterin vajaatoimintadiagnoosin prediktiiviset arvot olivat hyviä (NPV 0,83, PPV 0,85) verrattuna muihin sydänsairaisiin potilaihin ja tiedonlouhinnan alatyypityksen tarkkuus verrattuna kliiniseen oli 86 %. Tämä väitöskirja osoittaa, että HILMO-pohjaiset vajaatoimintadiagnoosit toimivat tieteellisenä päätetapahtumana ja että vajaatoiminnan alatyyppiä voidaan tarkentaa tekstilouhinnalla, sekä tuo uutta tietoa suolasta ja diabeteksesta vajaatoiminnan riskitekijöinä