9 research outputs found

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    Get PDF
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.Peer Reviewe

    Eredu aurresaleen balidazio tekniken konparaketa eta inplementazioa

    Get PDF
    Gaur egun, eredu aurresaleak gero eta indar handiagoa hartzen ari dira. Etorkizuneko egoerak aurresan ahal izateko erabiltzen dira, eta adituei egoera horiek kontuan izanda, hartu beharreko neurriak hartzen laguntzeko balio diete. Esaterako, medikuntzan eredu mota hauek asko erabiltzen dira, eta medikuei gaixoen egoera, epe batean, nolakoa izango den aurreikusten laguntzen diete. Hortaz, duten garrantzia ikusita, esan beharrik ez dago behar-beharrezkoa dela eredu aurresaleak dagarriak izatea eta aurresaten dutenaren eta benetan gertatuko denaren artean ahalik eta ezberdintasun gutxien egotea. Horretarako, ereduen balidazioa egiten da

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    Get PDF
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap

    Development and validation of prediction models for complex sampling data

    No full text
    292 p.Complex survey data are becoming increasingly well-known among researchers from different fields,including social and health sciences. This type of data is obtained by sampling the target populationthrough a complex sampling design. One of the characteristics of this kind of data compared to simplerandom samples, are the sampling weights, which indicate the number of units that each sampledobservation represents in the population. However, the role of sampling weights when modellingcomplex survey data has generated a large debate over the years. In this thesis, we analyze the impact thatsampling weights have in the development process of prediction models for survey data obtained basedon complex sampling designs. In particular, we have made advances in the context of estimation oflogistic regression model parameters, variable selection, estimation of the discrimination ability andclassification of individuals. The validity of the new design-based proposals has been analyzed by meansof extensive simulation studies in which we compare their performance to the traditional unweightedtechniques. In addition, the design-based proposals have been applied to real complex survey data andimplemented in two R-packages (wlasso and wROC) that are freely available

    Eredu aurresaleen balidazio tekniken konparaketa eta inplementazioa

    Get PDF
    Gaur egun, eredu aurresaleak gero eta indar handiagoa hartzen ari dira. Etorkizuneko egoerak aurresan ahal izateko erabiltzen dira, eta adituei egoera horiek kontuan izanda, hartu beharreko neurriak hartzen laguntzeko balio diete. Esaterako, medikuntzan eredu mota hauek asko erabiltzen dira, eta medikuei gaixoen egoera, epe batean, nolakoa izango den aurreikusten laguntzen diete. Hortaz, duten garrantzia ikusita, esan beharrik ez dago behar-beharrezkoa dela eredu aurresaleak dagarriak izatea eta aurresaten dutenaren eta benetan gertatuko denaren artean ahalik eta ezberdintasun gutxien egotea. Horretarako, ereduen balidazioa egiten da

    Estimation of logistic regression parameters for complex survey data : simulation study based on real survey data

    Get PDF
    Altres ajuts: Gobierno Vasco. Departamento de Educación, Política Lingüísstica y Cultura. IT1456-22 ; Programa BERC 2022-2025 ; Treball de IA amb el suport de PIF18/213In complex survey data, each sampled observation has assigned a sampling weight, indicating the number of units that it represents in the population. Whether sampling weights should or not be considered in the estimation process of model parameters is a question that still continues to generate much discussion among researchers in different fields. We aim to contribute to this debate by means of a real data based simulation study in the framework of logistic regression models. In order to study their performance, three methods have been considered for estimating the coefficients of the logistic regression model: a) the unweighted model, b) the weighted model, and c) the unweighted mixed model. The results suggest the use of the weighted logistic regression model is superior, showing the importance of using sampling weights in the estimation of the model parameters

    Prospecciones arqueológicas orientadas a la localización de yacimientos de la Edad del Hierro en Gipuzkoa

    Get PDF
    El presente trabajo recoge los resultados correspondientes a la segunda fase del proyecto encaminado a la localización de hábitats y lugares funerarios del Bronce Final y Edad del Hierro en Gipuzkoa, habiéndose llevado a cabo a lo largo de este año trabajos en siete puntos distribuídos por diferentes áreas del territorio objeto de estudio.Lan honekin Gipuzkoako Azken Brontze Aro eta Burdin Aroko bizileku eta hilerriak aurkitzeko egin den proiektuaren bigarren fasea aurkezten dugu. Urte honetan zehar, ikertutako eremuaren zazpi toki ezberdinetan egin dira miaketak .Le présent article montre les résultats de la deuxième phase des prospections orientées à la localisation des habitats et endroits funéraires du Bronze Final et de l'Age du fer en Gipuzkoa. Cette année on a fait des travaux de prospection dans sept endroits diférents de la geographie où l'on encadre le project géneral.The present work contains the results corresponding to the second phase of the project intended for localizing the habitats and funerary place of Final Bronze and Iron Age in Gipuzkoa, several works having been carried through during the current year, in seven points distributed around different areas of the studied territory

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    No full text
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.Peer Reviewe

    On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    No full text
    When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap