91 research outputs found

    p3state.msm: Analyzing Survival Data from an Illness-Death Model

    Get PDF
    In longitudinal studies of disease, patients can experience several events across a followup period. Analysis of such studies can be successfully performed by multi-state models. In the multi-state framework, issues of interest include the study of the relationship between covariates and disease evolution, estimation of transition probabilities, and survival rates. This paper introduces p3state.msm, a software application for R which performs inference in an illness-death model. It describes the capabilities of the program for estimating semi-parametric regression models and for implementing nonparametric estimators for several quantities. The main feature of the package is its ability for obtaining nonMarkov estimates for the transition probabilities. Moreover, the methods can also be used in progressive three-state models. In such a model, estimators for other quantities, such as the bivariate distribution function (for sequentially ordered events), are also given. The software is illustrated using data from the Stanford Heart Transplant Study.

    Selecting the number of categories of the lymph node ratio in cancer research: A bootstrap-based hypothesis test

    Get PDF
    The high impact of the lymph node ratio as a prognostic factor is widely established in colorectal cancer, and is being used as a categorized predictor variable in several studies. However, the cut-off points as well as the number of categories considered differ considerably in the literature. Motivated by the need to obtain the best categorization of the lymph node ratio as a predictor of mortality in colorectal cancer patients, we propose a method to select the best number of categories for a continuous variable in a logistic regression framework. Thus, to this end, we propose a bootstrap-based hypothesis test, together with a new estimation algorithm for the optimal location of the cut-off points called BackAddFor, which is an updated version of the previously proposed AddFor algorithm. The performance of the hypothesis test was evaluated by means of a simulation study, under different scenarios, yielding type I errors close to the nominal errors and good power values whenever a meaningful difference in terms of prediction ability existed. Finally, the methodology proposed was applied to the CCR-CARESS study where the lymph node ratio was included as a predictor of five-year mortality, resulting in the selection of three categories.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Basque Government through the Consolidated Research Group MATHMODE (IT1294-19) from the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco, the BERC 2018-2021 program and the SPRI Elkartek project 3KIA (KK-2020/00049); by the Spanish Government through the Ministerio de Ciencia, Innovación y Universidades: BCAM Severo Ochoa accreditation SEV-2017-0718 and by Ministerio de Economía y Competitividad and FEDER under research grants MTM2014-55966-P, MTM2016-74931-P and MTM2017-89422-P; and by Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2019-2022) and the EU (ERDF), Ref. ED431G2019/06. Financial support for data collection was provided in part by grants from the Instituto de Salud Carlos III, (PS09/00314, PS09/00910, PS09/00746, PS09/00805, PI09/90460, PI09/90490, PI09/90453, PI09/90441, PI09/90397, and the thematic networks REDISSEC - Red de Investigación en Servicios de Salud en Enfermedades Crónicas), co-funded by European Regional Development Fund/European Social Fund (ERDF/ESF “Investing in your future”); and the Research Committee of the Hospital Galdakao

    A method for determining groups in nonparametric regression curves: application to prefrontal cortex neural activity analysis

    Get PDF
    Generalized additive models provide a flexible and easily-interpretable method for uncovering a nonlinear relationship between response and covariates. In many situations, the effect of a continuous covariate on the response varies across groups defined by the levels of a categorical variable. When confronted with a considerable number of groups defined by the levels of the categorical variable and a factor‐by‐curve interaction is detected in the model, it then becomes important to compare these regression curves. When the null hypothesis of equality of curves is rejected, leading to the clear conclusion that at least one curve is different, we may assume that individuals can be grouped into a number of classes whose members all share the same regression function. We propose a method that allows determining such groups with an automatic selection of their number by means of bootstrapping. The validity and behavior of the proposed method were evaluated through simulation studies. The applicability of the proposed method is illustrated using real data from an experimental study in neurology.This work was partially supported by project 2017/00001/006/001/097: Ayudas para el man tenimiento de actividades de investigaci ´on de institutos universitarios de investigaci ´on y grupos de investigaci´on de la Universidad de Oviedo para el ejercicio 2021. Luís Meira-Machado acknowledges financial support from Portuguese Funds through FCT - ”Fundação para a Ciência e a Tecnologia”, within the projects UIDB ˆ /00013/2020, UIDP/00013/2020. Javier Roca-Pardinas acknowledges financial support from Grant PID2020-118101GB-I00, Ministerio de Ciencia e Innovacion (MCIN/AEI /10.13039/501100011033)

    Functional Location-Scale Model to Forecast Bivariate Pollution Episodes

    Get PDF
    Predicting anomalous emission of pollutants into the atmosphere well in advance is crucial for industries emitting such elements, since it allows them to take corrective measures aimed to avoid such emissions and their consequences. In this work, we propose a functional location-scale model to predict in advance pollution episodes where two pollutants are involved. Functional generalized additive models (FGAMs) are used to estimate the means and variances of the model, as well as the correlation between both pollutants. The method not only forecasts the concentrations of both pollutants, it also estimates an uncertainty region where the concentrations of both pollutants should be located, given a specific level of uncertainty. The performance of the model was evaluated using real data of SO 2 and NO x emissions from a coal-fired power station, obtaining good resultsThe authors acknowledge financial support from: (1) UO-Proyecto Uni-Ovi (PAPI-18-GR-2014-0014), (2) Project MTM2016-76969-P from Ministerio de Economía y Competitividad—Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy, (3) Nuevos avances metodológicos y computacionales en estadística no-paramétrica y semiparamétrica—Ministerio de Ciencia e Investigación (MTM2017-89422-P)S

    clustcurv: An R Package for Determining Groups in Multiple Curves

    Get PDF
    In many situations, it could be interesting to ascertain whether groups of curves can be performed, especially when confronted with a considerable number of curves. This paper introduces an R package, known as clustcurv, for determining clusters of curves with an automatic selection of their number. The package can be used for determining groups in multiple survival curves as well as for multiple regression curves. Moreover, it can be used with large numbers of curves. An illustration of the use of clustcurv is provided, using both real data examples and artificial data.The authors acknowledge financial support by the Spanish Ministry of Economy and Competitiveness (MINECO) through project MTM2017-89422-P and MTM2017-82379-R (funded by (AEI/FEDER, UE). Thanks to the Associate Editor and the referee for comments and suggestions that have improved this paper

    Bootstrap-based procedures for inference in nonparametric ROC regression analysis

    Get PDF
    Before the use of a diagnostic test in a routine clinical setting, the rigorous evaluation of its diagnostic accuracy is an essential step. The receiver operating characteristic (ROC) curve is the measure of accuracy most widely used for continuous diagnostic tests. However, the possible impact of extra information about the patient (or even the environment) on diagnostic accuracy needs to be also assessed. In this paper, attention is focused on an estimator for the covariate-specific ROC curve based on direct regression modelling and nonparametric smoothing techniques. This approach defines the class of generalized additive models for the ROC curve (ROC-GAM). The main aim of the paper is to offer new inferential procedures for testing the effect of co- variates over the conditional ROC curve within the ROC-GAM context. Specifically, two different bootstrap-based tests are suggested to check (a) the possible effect of continuous covariates on the ROC curve; and (b) the presence of factor-by-curve interaction terms. The validity of the proposed bootstrap-based procedures is supported by simulations. To facilitate the application of these new procedures in practice, an R-package, known as npROCRegression, is provided and briefly described. Finally, data derived from a computed-aided diagnostic (CAD) system for the automatic detection of tumour masses in breast cancer is analysed

    Bootstrap-based procedures for inference in nonparametric receiver-operating characteristic curve regression analysis

    Get PDF
    Prior to using a diagnostic test in a routine clinical setting, the rigorous evaluation of its diagnostic accuracy is essential. The receiver-operating characteristic curve is the measure of accuracy most widely used for continuous diagnostic tests. However, the possible impact of extra information about the patient (or even the environment) on diagnostic accuracy also needs to be assessed. In this paper, we focus on an estimator for the covariate-specific receiver-operating characteristic curve based on direct regression modelling and nonparametric smoothing techniques. This approach defines the class of generalised additive models for the receiver-operating characteristic curve. The main aim of the paper is to offer new inferential procedures for testing the effect of covariates on the conditional receiver-operating characteristic curve within the above-mentioned class. Specifically, two different bootstrap-based tests are suggested to check (a) the possible effect of continuous covariates on the receiver-operating characteristic curve and (b) the presence of factor-by-curve interaction terms. The validity of the proposed bootstrap-based procedures is supported by simulations. To facilitate the application of these new procedures in practice, an R-package, known as npROCRegression, is provided and briefly described. Finally, data derived from a computer-aided diagnostic system for the automatic detection of tumour masses in breast cancer is analyse

    A distance correlation approach for optimum multiscale selection in 3D point cloud classification

    Get PDF
    [Abstract] Supervised classification of 3D point clouds using machine learning algorithms and handcrafted local features as covariates frequently depends on the size of the neighborhood (scale) around each point used to determine those features. It is therefore crucial to estimate the scale or scales providing the best classification results. In this work, we propose three methods to estimate said scales, all of them based on calculating the maximum values of the distance correlation (DC) functions between the features and the label assigned to each point. The performance of the methods was tested using simulated data, and the method presenting the best results was applied to a benchmark data set for point cloud classification. This method consists of detecting the local maximums of DC functions previously smoothed to avoid choosing scales that are very close to each other. Five different classifiers were used: linear discriminant analysis, support vector machines, random forest, multinomial logistic regression and multilayer perceptron neural network. The results obtained were compared with those from other strategies available in the literature, being favorable to our approach.Xunta de Galicia; ED431G 2019/01Ministerio de Ciencia, Innovación y Universidades; MTM2016-76969-PXunta de Galicia; ED431C-2020-14MINECO/AEI/FEDER, UE; MTM2017-89422-

    seq2R: an R package to detect change points in DNA sequences

    Get PDF
    Identifying the mutational processes that shape the nucleotide composition of the mitochondrial genome (mtDNA) is fundamental to better understand how these genomes evolve. Several methods have been proposed to analyze DNA sequence nucleotide composition and skewness, but most of them lack any measurement of statistical support or were not developed taking into account the specificities of mitochondrial genomes. A new methodology is presented, which is specifically developed for mtDNA to detect compositional changes or asymmetries (AT and CG skews) based on nonparametric regression models and their derivatives. The proposed method also includes the construction of confidence intervals, which are built using bootstrap techniques. This paper introduces an R package, known as seq2R, that implements the proposed methodology. Moreover, an illustration of the use of seq2R is provided using real data, specifically two publicly available complete mtDNAs: the human (Homo sapiens) sequence and a nematode (Radopholus similis) mitogenome sequence.Ministerio de Ciencia e Innovación | Ref. MTM2011-23204Ministerio de Ciencia e Innovación | Ref. PID2020-118101GB-I00Xunta de Galicia | Ref. 10PXIB 300 068 P

    Assessing the Genetic Influence of Ancient Sociopolitical Structure: Micro-differentiation Patterns in the Population of Asturias (Northern Spain)

    Get PDF
    Las poblaciones humanas de la Península Ibérica son el diverso resultado de una compleja mezcla de culturas a lo largo de la historia, y están separadas por claras barreras sociales, culturales, lingüísticas y geográficas. Las mayores diferencias genéticas entre poblaciones cercanas y relacionadas se encuentran en el tercio norte de España, y se definen por un fenómeno comúnmente llamado "micro-diferenciación". Se ha discutido cómo esta forma de estructuración genética puede relacionarse con el abrupto terreno y las sociedades antiguas del norte de Iberia, pero esto es difícil de probar en muchas regiones debido a la intensa movilidad humana de los siglos anteriores. Aun así, la comunidad autónoma española de Asturias muestra una compleja historia que parece indicar un cierto aislamiento de su población. Esto, junto con su difícil terreno lleno de profundos valles y altas montañas, la hace adecuada para realizar un estudio de estructuración genética, basado en ADN mitocondrial y marcadores del Cromosoma-Y. Nuestros análisis no sólo muestran que existen patrones de micro-diferenciación dentro del territorio asturiano, si no que estos patrones son sorprendentemente similares entre ambos marcadores. La inferencia de barreras al flujo génico también indica que las poblaciones asturianas del norte costero y del sur montañoso parecen estar relativamente aisladas del resto del territorio. Estos hallazgos se discuten a la luz de datos históricos y geográficos que, junto con evidencias anteriores, muestran que el origen de la estructuración genética actual bien pudiera estar en divisiones sociopolíticas de las eras romana y pre-romana
    corecore