30 research outputs found
An R package for inference and prediction in an illness-death model
Multi-state models are a useful way of describing a process in which an individual moves through a number of nite states in continuous time. The illness-death model plays a central role in the theory and practice of these models, describing the dynamics of healthy subjects who may move to an intermediate `diseased' state before entering into a terminal absorbing state. In these models one important goal is the modeling of transition rates which is usually done by studying the relationship between covariates and disease evolution. However, biomedical researchers are also interested in reporting other interpretable results in a simple and summarized manner. These include estimates of predictive probabilities, such as the transition probabilities, occupation probabilities, cumulative incidence functions, prevalence and the sojourn time distributions. An R package
was built providing answers to all these topics
survidm: An R package for Inference and Prediction in an Illness-Death Model
Multi-state models are a useful way of describing a process in which an individual moves through a number of finite states in continuous time. The illness-death model plays a central role in the theory and practice of these models, describing the dynamics of healthy subjects who may move to an intermediate "diseased" state before entering into a terminal absorbing state. In these models, one important goal is the modeling of transition rates which is usually done by studying the relationship between covariates and disease evolution. However, biomedical researchers are also interested in reporting other interpretable results in a simple and summarized manner. These include estimates of predictive probabilities, such as the transition probabilities, occupation probabilities, cumulative incidence functions, and the sojourn time distributions. The development of survidm package has been motivated by recent contribution that provides answers to all these topics. An illustration of the software usage is included using real data.This research was financed by Portuguese Funds through FCT - "Fundação para a Ciência e a Tecnolo gia", within the research grant PD/BD/142887/2018. Luís Meira-Machado acknowledges financial
support from the Spanish Ministry of Economy and Competitiveness MINECO through project
MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym "AFTERAM"
An R package for determining groups in multiple survival curves
Survival analysis includes a wide variety of methods for analyzing time-to-event data. One basic but important goal in survival analysis is the comparison of survival curves between groups. Several nonparametric methods have been proposed in the literature to test for the equality of survival curves for censored data. When the null hypothesis of equality of curves is rejected, leading to the clear conclusion that at least one curve is different, it can be interesting to ascertain whether curves can be grouped or if all these curves are different from each other. We present the R clustcurv package which allows determining groups with an automatic selection of their number. The applicability of the proposed method is illustrated using real data
A method for determining groups in multiple survival curves
Survival analysis includes a wide variety of methods for analyzing time‐to‐event data. One basic but important goal in survival analysis is the comparison of survival curves between groups. Several nonparametric methods have been proposed in the literature to test for the equality of survival curves for censored data. When the null hypothesis of equality of curves is rejected, leading to the clear conclusion that at least one curve is different, it can be interesting to ascertain whether curves can be grouped or if all these curves are different from each other. A method is proposed that allows determining groups with an automatic selection of their number. The validity and behavior of the proposed method was evaluated through simulation studies. The applicability of the proposed method is illustrated using real data. Software in the form of an R package has been developed implementing the proposed method.Fundação para a Ciência e a Tecnologia | Ref. SFRH/BPD/93928/201
Explainable generalized additive neural networks with independent neural network training
Neural Networks are one of the most popular methods nowadays given their high performance on diverse tasks, such as computer vision, anomaly detection, computer-aided disease detection and diagnosis or natural language processing. While neural networks are known for their high performance, they often suffer from the so-called “black-box” problem, which means that it is difficult to understand how the model makes decisions. We introduce a neural network topology based on Generalized Additive Models. By training an independent neural network to estimate the contribution of each feature to the output variable, we obtain a highly accurate and explainable deep learning model, providing a flexible framework for training Generalized Additive Neural Networks which does not impose any restriction on the neural network architecture. The proposed algorithm is evaluated through different simulation studies with synthetic datasets, as well as a real-world use case of Distributed Denial of Service cyberattack detection on an Industrial Control System. The results show that our proposal outperforms other GAM-based neural network implementations while providing higher interpretability, making it a promising approach for high-risk AI applications where transparency and accountability are crucial.Xunta de GaliciaAgencia Estatal de Investigación | Ref. PID2020-118101GB-I0
clustcurv: An R Package for Determining Groups in Multiple Curves
In many situations, it could be interesting to ascertain whether groups of curves can be performed, especially when confronted with a considerable number of curves. This paper introduces an R package, known as clustcurv, for determining clusters of curves with an automatic selection of their number. The package can be used for determining groups in multiple survival curves as well as for multiple regression curves. Moreover, it can be used with large numbers of curves. An illustration of the use of clustcurv is provided, using both real data examples and artificial data.The authors acknowledge financial support by the Spanish Ministry of Economy and Competitiveness (MINECO) through project MTM2017-89422-P and MTM2017-82379-R (funded by (AEI/FEDER, UE).
Thanks to the Associate Editor and the referee for comments and suggestions that have improved this paper
A method for determining groups in cumulative incidence curves in competing risk data
The cumulative incidence function is the standard method for estimating the marginal probability of a given event in the presence of competing risks. One basic but important goal in the analysis of competing risk data is the comparison of these curves, for which limited literature exists. We proposed a new procedure that lets us not only test the equality of these curves but also group them if they are not equal. The proposed method allows determining the composition of the groups as well as an automatic selection of their number. Simulation studies show the good numerical behavior of the proposed methods for finite sample size. The applicability of the proposed method is illustrated using real data.Fundação para a Ciência e a Tecnologia | Ref. UIDB/00013/2020Fundação para a Ciência e a Tecnologia | Ref. UIDP/00013/2020Agencia Estatal de Investigación | Ref. PID2020‐118101GB‐I0
npregfast: an R package for nonparametric estimation and inference in life sciences
We present the R npregfast package via some applications involved with the study of living organisms. The package implements nonparametric estimation procedures in regression models with or without factor-by-curve interactions. The main feature of the package is its ability to perform inference regarding these models. Namely, the implementation of different procedures to test features of the estimated regression curves: on the one hand, the comparisons between curves which may vary across groups defined by levels of a categorical variable or factor; on the other hand, the comparisons of some critical points of the curve (e.g., maxima, minima or inflection points), studying for this purpose the derivatives of the curve.Ministerio de Ciencia e Innovación | Ref. MTM2011-23204Xunta de Galicia | Ref. 10PXIB300068PRFundação para a Ciência e a Tecnologia | Ref. SFRH/BPD/93928/201
clustcurv: an R package for determining groups in multiple curves
In many situations, it could be interesting to ascertain whether groups of curves can be performed, especially when confronted with a considerable number of curves. This paper introduces an R package, known as clustcurv, for determining clusters of curves with an automatic selection of their number. The package can be used for determining groups in multiple survival curves as well as for multiple regression curves. Moreover, it can be used with large numbers of curves. An illustration of the use of clustcurv is provided, using both real data examples and artificial data.Agencia Estatal de Investigación | Ref. MTM2017-89422-PAgencia Estatal de Investigación | Ref. MTM2017-82379-
FWDselect: an R package for variable selection in regression models
In multiple regression models, when there are a large number (p) of explanatory variables which may or may not be relevant for predicting the response, it is useful to be able to reduce the model. To this end, it is necessary to determine the best subset of q (q ≤ p) predictors which will establish the model with the best prediction capacity. FWDselect package introduces a new forward stepwise based selection procedure to select the best model in different regression frameworks (parametric or nonparametric). The developed methodology, which can be equally applied to linear models, generalized linear models or generalized additive models, aims to introduce solutions to the following two topics: i) selection of the best combination of q variables by using a step-by-step method; and, perhaps, most importantly, ii) search for the number of covariates to be included in the model based on bootstrap resampling techniques. The software is illustrated using real and simulated dataFundação para a Ciência e a Tecnologia | Ref. SFRH/BPD/93928/2013Xunta de Galicia | Ref. 10PXIB300068PRFundação para a Ciência e a Tecnologia | Ref. PEst-OE/MAT/UI0013/2014Ministerio de Ciencia e Innovación | Ref. MTM2011-2320