Cardiovascular Diseases (CVD) present the highest world health rate, constituting a risk
factor to patients with diabetes and simultaneously a consequence of dyslipidemia. E ective lipid management of patients with diabetes is still largely unattained, requiring better
perception of both patients and healthcare professionals. Aiming at better understanding
the in
uence of clinical parameters on Low Density Lipoprotein (LDL)-cholesterol patterns
of type 2 diabetes uncontrolled patients, the Electronic Health Records (EHR) provided
by APDP (Associa c~ao Protetora de Diabetes Portugal) have been subject to data mining
techniques.
The database content was primarily analyzed to understand data integrity and to avoid
usage of EHR's corrupted values or misleading information. The statistical distribution of
each clinical parameter reported in the data base took place to identify their individual
behavior and to enable statistically coherent identi cation of the cohort to be used when
modeling LDL.
As a rst approach, LDL linear modeling was considered, using both ordinary leastsquares
and stepwise approaches. Then, LDL non-linear modeling was tested, using the
same populations employed on linear modeling to assess the most accurate and practical
LDL model. The provided EHR included 32577 medical appointments held by 1767 patients
between January 2008 and February 2018. More than 10 clinical features were studied,
leading to the decision of limiting the case-study population to those patients who had at least
5 Medical Appointments (MA) during the decade. From all MA's, 32% and 63% reported
LDL and Glycated Hemoglobin (HbA1c) measurements, respectively, but some MA's did not
report both simultaneously.
Six linear models, relating di erent sets of 6 clinical parameters were tested. The linear
model 3, involving LDL, Total Cholesterol, HDL, Triglyceride, HbA1c and Platelet is the
elected linear model with a Root Mean Square Error (RMSE) of 0.07. The model where
Platelets are substituted by Proteinuria presents a RMSE of just 0.054 but employed solely
38 case-studies.
Neural network-based modeling strategies were tested as an alternative to linear models.
In this sense, the Multi-Objective Genetic Algorithm (MOGA) was used. After data preprocessing,
MOGA was performed twice using di erent threshold values. Six models were
developed considering di erent combinations of clinical parameters. For each model, the
population was divided into 3 groups: 60% of the population was used to train the network,
20% to test the model and the remaining 20% to validate the model.
Using the populations employed by each MOGA run, the stepwise algorithm was used to
identify the relevance of each clinical parameter in the model and create another linear model using this parameter set. The MOGA model with the best training performance was Model 4,
while model 2 was the one performing best in validation with RMSE of 0.057. However, linear
model 5 created using the parameter selection identi ed by the MOGA presented a RMSE of
0.054 during validation when total cholesterol, HDL, triglyceride, HbA1c, microalbuminuria,
creatinine, MDRD, sex and age are used in the composition of the LDL linear model.
Therefore, we can conclude that LDL can be modeled by a linear model using 6 or 10
clinical variables with very low mean square error.As doenças cardiovasculares (CVD) continuam a ser a maior causa de morte no mundo
e constituem um fator de risco para diabéticos para além de os diabéticos terem maior
propensão para desenvolver CVD. No entanto, apesar de as diretrizes recentes cobrirem o risco de CVD, o efetivo controlo lipídico está longe de ser conseguido. Além disso, a autogestão lipídica em conjunto com o gerenciamento de decisões terapêuticas, nem sempre assume a prioridade adequada quer pelos pacientes quer pelos profissionais de saúde.
Pretendendo compreender melhor a influência dos parâmetros clínicos no colesterol de
lipoproteínas de baixa densidade (LDL) de doentes diabéticos tipo 2, doentes estes cujo gerenciamento dos valores lipídicos se suspeitam inst aveis, recorreu-se a registos eletrónicos de saúde (EHR) providenciados pela APDP (Associação Protetora de Diabetes Portugal) para fazer um estudo baseado em técnicas de mineração de dados.(…