45 research outputs found

    Proposal and validation of methodologies for the categorisation of continuous variables in the development of prediction models

    Get PDF
    158 p.Prediction models are currently relevant in a number of fieelds such as physics, meteorology, finance or medicine, among others. In the medical field, prediction models are gaining importance as a support for decision-making whereby increased knowledge of potential predictors helps the decision-making process. Clinical prediction models may provide the necessary input for shared decision-making by estimating an individual's risk of an unfavourable event or developing a certain disease over a speciFIc time period on the basis of his or her clinical and non-clinical proFIle. A vital factor in the development of prediction models is the selection of the predictors or covariates (clinical variables) to be used in the model. From a statistical perspective, categorising continuous variables is not advisable, since it may entail a loss of information and power. In addition, there are statistical modelling techniques such as the generalised additive models (GAM) which do not require any assumption of linearity between predictors and response variables, and so allow for the relationship between the predictor and the outcome to be modelled more appropriately. Yet in clinical research and, more specically, in the development of prediction models for use in clinical practice, both clinicians and health managers call for the categorisation of continuous parameters. Despite the fact that categorisation is a common practice in clinical research, there are no unied criteria for the selection of the cut points. Previous work has been done in the categorisation of continuous variables but with the aim in almost all cases of dichotomising the predictor variable. In this dissertation, we focus on the categorisation of continuous variables to be used in the development of prediction models, considering that the use of more than two categories may be preferable. This serves to reduce the loss of information and enables the relationship between the covariate and the response variable to be retained. Our goal is to propose a methodology to categorise continuous predictor variables xv xvi Summary in regression-based prediction models, mainly focussing on the logistic and Cox regression models which are those most widely used in the medical eld for modelling dichotomous and time-to-event outcomes respectively. The work presented in this dissertation was initially motivated by the development of a prediction model in the context of patients with chronic obstructive pulmonary disease (COPD). Clinicians agreed on the use of a categorised version of some clinical parameters such as the blood gas PCO2 or the respiratory rate in the prediction model. However, they did not agree on the location and number of cut points. We noticed that these were usually based on quartiles and when they were based on clinical criteria there was no agreement between them. Several proposals are available in the literature, but most aimed at the selection of a single cut point. Thus, we considered developing a methodology to categorise continuous predictor variables in prediction models. In a stage we considered categorising a continuous predictor variable X by considering its graphical relationship with a binary response variable Y based on a logistic GAM with P-spline smoothers. We proposed to categorise X in a minimum of three categories, considering the limits of the average-risk category as the location of the cut points. The location of the third cut point, if needed, was to be based on clinical criteria or a change in the slope of the graphical display. Nevertheless, this methodology had some restrictions: the location of this third cut point was subject to subjectivity, it did not allow us to categorise X in a multivariate setting and it was limited to a binary outcome. Thus in a second stage, we claimed for a proposal that provided with an optimal categorisation of a continuous predictor in a multivariate setting for different distributions of the response variable. We started by developing a methodology in which the location for any given k number of cut points for X could be optimally selected in a logistic regression, in addition or not to a set of other predictor variables, Z = (Z1; : : : ;Zp). The proposal consisted of the selection of a vector vk = (x1; : : : ; xk) of k cut points in such a way that the best logistic predictive model was obtained for the response variable Y . Specically, given k the number of cut points set for categorising X in k + 1 intervals, let us denote Xcatk the corresponding categorised variable taking values from 0 to k. Then, what we propose is that the vector of k cut points vk = (x1; : : : ; xk), which maximises the area under the receiver operative characteristic (ROC) curve (AUC) of the logistic regression Summary xvii model shown in equation (2) is thus the vector of the optimal k cut points. logitZ;Xcatk )) = 0 +Xpr=1Zr

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    Get PDF
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.Peer Reviewe

    Clinical prediction rules for adverse evolution in patients with COVID-19 by the Omicron variant

    Get PDF
    Objective We identify factors related to SARS-CoV-2 infection linked to hospitalization, ICU admission, and mortality and develop clinical prediction rules. Methods Retrospective cohort study of 380,081 patients with SARS-CoV-2 infection from March 1, 2020 to January 9, 2022, including a subsample of 46,402 patients who attended Emergency Departments (EDs) having data on vital signs. For derivation and external validation of the prediction rule, two different periods were considered: before and after emergence of the Omicron variant, respectively. Data collected included sociodemographic data, COVID-19 vaccination status, baseline comorbidities and treatments, other background data and vital signs at triage at EDs. The predictive models for the EDs and the whole samples were developed using multivariate logistic regression models using Lasso penalization. Results In the multivariable models, common predictive factors of death among EDs patients were greater age; being male; having no vaccination, dementia; heart failure; liver and kidney disease; hemiplegia or paraplegia; coagulopathy; interstitial pulmonary disease; malignant tumors; use chronic systemic use of steroids, higher temperature, low O2 saturation and altered blood pressure-heart rate. The predictors of an adverse evolution were the same, with the exception of liver disease and the inclusion of cystic fibrosis. Similar predictors were found to be related to hospital admission, including liver disease, arterial hypertension, and basal prescription of immunosuppressants. Similarly, models for the whole sample, without vital signs, are presented. Conclusions We propose risk scales, based on basic information, easily-calculable, high-predictive that also function with the current Omicron variant and may help manage such patients in primary, emergency, and hospital care.This work was supported in part by the health outcomes group from Galdakao-Barrualde Health Organization; the Kronikgune Institute for Health Service Research; and the thematic network–REDISSEC (Red de Investigación en Servicios de Salud en Enfermedades Crónicas)–of the Instituto de Salud Carlos III. The work of IB was financially supported in part by grants from the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco [IT1456-22] and by the Ministry of Science and Innovation through BCAM Severo Ochoa accreditation [CEX2021-001142-S/MICIN/AEI/10.13039/501100011033] and through project [PID2020-115882RB-I00/AEI/10.13039/501100011033] funded by Agencia Estatal de Investigación and acronym “S3M1P4R” and also by the Basque Government through the BERC 2022–2025 program and the BMTF ‘‘Mathematical Modeling Applied to Health’’ Project

    Estudio matemático de modelos epidemiológicos

    Get PDF
    Este trabajo se centra en la propagación de enfermedades en una población mediante modelos matemáticos, que ha tenido un mayor auge en los últimos años y sobre todo, este último año con la pandemia del COVID-19. Se llevará a cabo un estudio analítico y numérico de dichos modelos. Para ello, se tratan conceptos de estabilidad de Lyapunov, así como una pequeña introducción a las bifurcaciones. Finalmente, veremos una simulación del COVID-19, la diferente evolución sin tomar medidas de contención y tomandolas.<br /

    Eredu aurresaleetako aldagai jarraituen kategorizaziorik hoberena lortzeko metodologia proposamena: aplikazio zehatza medikuntzan

    Get PDF
    Medikuntzan parametro kliniko asko kategorizatzen dira erabaki prozesuak errazteko. Areago arau aurresale klinikoen garapenean ohiko teknika bat da aldagaien kategorizazio hau. Aldagai aurresale bat kategorizatzerakoan kategoria kopurua aldagai aurresale eta erantzulearen arteko erlazioaren menpe dagoenez bi kategoria baino gehiagoren beharra aztertu behar da. Guk metodo bat proposatzen dugu aldagai aurresaleak kategorizatzeko eredu aurresaleetan: eredutik lortutako funtzio leunaren arabera gutxienez batez besteko arriskuko kategoria bat eta arrisku altu eta arrisku baxuko behar beste kategoriak sortzea. Metodologia hau bihotz-gutxiegitasun desorekatu larria duten pazienteen kohorte prospektibo batean aplikatu dugu aldagai aurresalea arteria-tentsioa eta aldagai erantzulea epe laburreko heriotza izan direlarik. Erregresio logistiko gehigarria erabili da aldagai aurresale eta erantzulearen arteko erlazioa erakusteko. Proposatutako metodoa erabiliz lortutako kategoria-aldagaia jatorrizko aldagai jarraituarekin konparatu dugu AIC eta AUC parametroak erabiliz. Lau kategorietako arteria-tentsio sistolikoko proposamena honako hau da = 120 (120,136] (136,158] eta 158 baino handiagoa. Lau kategoria horietarako AIC=344,59 eta AUC=0,72 balioak lortu dira. Aldagai jarraiturako AIC=345,7 eta AUC=0,718 balioak lortu dira bi AUC balioen artean diferentzia adierazgarririk egon gabe (p = 0,974). Guk proposaturiko metodoaren bitartez aldagai jarraitua kategorizatzeko beharrezkoak diren mozketa-puntu kopurua eta puntuen kokapenik hoberena lortzen da. Horrela lortutako kategoria-aldagaiak jatorrizko aldagai jarraituak bezainbesteko errendimendu ona ematen du

    Suspended sediment delivery from small catchments to the Bay of Biscay. What are the controlling factors?

    Get PDF
    The transport and yield of suspended sediment (SS) in catchments all over the world have long been topics of great interest. This paper addresses the scarcity of information on SS delivery and its environmental controls in small catchments, especially in the Atlantic region. Five steep catchments in Gipuzkoa (Basque Country) with areas between 56 and 796 km2 and that drain into the Bay of Biscay were continuously monitored for precipitation, discharge and suspended sediment concentration (SSC) in their outlets from 2006 to 2013. Environmental characteristics such as elevation, slope, land uses, soil depth and erodibility of the lithology were also calculated. The analysis included consideration of uncertainties in the SSC calibration models in the final suspended sediment yield (SSY) estimations. The total delivery of sediments from the catchments into the Bay of Biscay and its standard deviation was 272 200 ± 38 107 t·yr-1, or 151±21 t·km-2·yr-1, and the SSYs ranged from 46 ± 0.48 to 217±106 t·km-2·yr-1. Hydro-climatic variables and catchment areas do not explain the spatial variability found in SSY, whereas land use (especially non-native plantations) and management (human impacts) appear to be the main factors that control this variability. Obtaining long-term measurements on sediment delivery would allow for the effects of environmental and human induced changes on SS fluxes to better detected.However, the data provided in this paper offer valuable and quantitative information that will enable decision-makers to make more informed decisions on land management whime considering the effects of the delivery of SS

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    Get PDF
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap

    Eredu aurresaleetako aldagai jarraituen kategorizaziorik hoberena lortzeko metodologia proposamena: aplikazio zehatza medikuntzan

    Get PDF
    Medikuntzan parametro kliniko asko kategorizatzen dira erabaki prozesuak errazteko. Areago arau aurresale klinikoen garapenean ohiko teknika bat da aldagaien kategorizazio hau. Aldagai aurresale bat kategorizatzerakoan kategoria kopurua aldagai aurresale eta erantzulearen arteko erlazioaren menpe dagoenez bi kategoria baino gehiagoren beharra aztertu behar da. Guk metodo bat proposatzen dugu aldagai aurresaleak kategorizatzeko eredu aurresaleetan: eredutik lortutako funtzio leunaren arabera gutxienez batez besteko arriskuko kategoria bat eta arrisku altu eta arrisku baxuko behar beste kategoriak sortzea. Metodologia hau bihotz-gutxiegitasun desorekatu larria duten pazienteen kohorte prospektibo batean aplikatu dugu aldagai aurresalea arteria-tentsioa eta aldagai erantzulea epe laburreko heriotza izan direlarik. Erregresio logistiko gehigarria erabili da aldagai aurresale eta erantzulearen arteko erlazioa erakusteko. Proposatutako metodoa erabiliz lortutako kategoria-aldagaia jatorrizko aldagai jarraituarekin konparatu dugu AIC eta AUC parametroak erabiliz. Lau kategorietako arteria-tentsio sistolikoko proposamena honako hau da = 120 (120,136] (136,158] eta 158 baino handiagoa. Lau kategoria horietarako AIC=344,59 eta AUC=0,72 balioak lortu dira. Aldagai jarraiturako AIC=345,7 eta AUC=0,718 balioak lortu dira bi AUC balioen artean diferentzia adierazgarririk egon gabe (p = 0,974). Guk proposaturiko metodoaren bitartez aldagai jarraitua kategorizatzeko beharrezkoak diren mozketa-puntu kopurua eta puntuen kokapenik hoberena lortzen da. Horrela lortutako kategoria-aldagaiak jatorrizko aldagai jarraituak bezainbesteko errendimendu ona ematen du
    corecore