18 research outputs found

    Penalized functional spatial regression

    Get PDF
    This paper is focus on spatial functional variables whose observa- tions are realizations of a spatio-temporal functional process. In this context, a new smoothing method for functional data presenting spa- tial dependence is proposed. This approach is based on a P-spline estimation of a functional spatial regression model. As alternative to other geostatistical smoothing methods (kriging and kernel smooth- ing, among others), the proposed P-spline approach can be used to estimate the functional form of a set of sample paths observed only at a finite set of time points, and also to predict the corresponding func- tional variable at a new location within the plane of study. In order to test the good performance of the proposed method, two simulation studies and an application with real data will be developed and the results will be compared with functional kriging.Financial support from the project P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa. Junta de Andalucía, Spain and the projects MTM2013-47929-P and MTM 2011-28285-C02-C2 from Secretaría de Estado Investigación, Desarrollo e Innovación, Ministerio de Economía y Competitividad, Spain

    Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

    Full text link
    [EN] This paper introduces the Group Linear Algorithm with Sparse Principal decomposition, an algorithm for supervised variable selection and clustering. Our approach extends the Sparse Group Lasso regularization to calculate clusters as part of the model fit. Therefore, unlike Sparse Group Lasso, our idea does not require prior specification of clusters between variables. To determine the clusters, we solve a particular case of sparse Singular Value Decomposition, with a regularization term that follows naturally from the Group Lasso penalty. Moreover, this paper proposes a unified implementation to deal with, but not limited to, linear regression, logistic regression, and proportional hazards models with right-censoring. Our methodology is evaluated using both biological and simulated data, and details of the implementation in R and hyperparameter search are discussed.Laria, JC.; Aguilera-Morillo, MC.; Lillo, RE. (2022). Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models. Statistical Papers. 64(1):227-253. https://doi.org/10.1007/s00362-022-01313-z227253641Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119–137Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202Beisser D, Klau GW, Dandekar T, Müller T, Dittrich MT (2010) Bionet: an r-package for the functional analysis of biological networks. Bioinformatics 26(8):1129–1130Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305Bühlmann P, Rütimann P, van de Geer S, Zhang CH (2013) Correlated variables in regression: clustering and sparse estimation. J Stat Plan Inference 143(11):1835–1858Chen K, Chen K, Müller HG, Wang JL (2011) Stringing high-dimensional data for functional analysis. J Am Stat Assoc 106(493):275–284Ciuperca G (2020) Adaptive elastic-net selection in a quantile model with diverging number of variable groups. Statistics 54(5):1147–1170Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T (2008) Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13):i223–i231Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18Friedman J, Hastie T, Tibshirani R (2010a) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736Friedman J, Hastie T, Tibshirani R (2010b) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1Kuhn M (2020) tune: Tidy Tuning Tools. https://CRAN.R-project.org/package=tune, r package version 0.1.0Kuhn M, Vaughan D (2020) parsnip: a Common API to Modeling and Analysis Functions. https://CRAN.R-project.org/package=parsnip, r package version 0.0.5Laria JC, Carmen Aguilera-Morillo M, Lillo RE (2019) An iterative sparse-group lasso. J Comput Graph Stat 28(3):722–731Luo S, Chen Z (2020) Feature selection by canonical correlation search in high-dimensional multiresponse models with complex group structures. J Am Stat Assoc 115(531):1227–1235Moore DF (2016) Applied survival analysis using R. Springer, New YorkNdiaye E, Fercoq O, Gramfort A, Salmon J (2016) Gap safe screening rules for sparse-group lasso. In: Advances in Neural Information Processing Systems, pp 388–396Price BS, Sherwood B (2017) A cluster elastic net for multivariate regression. J Mach Learn Res 18(1):8685–8723Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850Ren S, Kang EL, Lu JL (2020) Mcen: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression. Stat Comput 30(2):291–304Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99(6):1015–1034Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp 2951–2959Therneau TM (2015) A package for survival analysis in S. https://CRAN.R-project.org/package=survival, version 2.38Therneau TM, Grambsch PM (2000) Modeling survival data: extending the cox model. Springer, New YorkTibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58(1):267–288Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J R Stat Soc Ser B 74(2):245–266Witten DM, Shojaie A, Zhang F (2014) The cluster elastic net for high-dimensional regression with unknown variable grouping. Technometrics 56(1):112–122Zhang Y, Zhang N, Sun D, Toh KC (2020) An efficient hessian based algorithm for solving large-scale sparse group lasso problems. Math Program 179(1):223–263Zhao H, Wu Q, Li G, Sun J (2019) Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression. J Am Stat Assoc 1–13Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. Stat Interface 3:557–574Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–32

    Adaptive sparse group LASSO in quantile regression

    Full text link
    [EN] This paper studies the introduction of sparse group LASSO (SGL) to the quantile regression framework. Additionally, a more flexible version, an adaptive SGL is proposed based on the adaptive idea, this is, the usage of adaptive weights in the penalization. Adaptive estimators are usually focused on the study of the oracle property under asymptotic and double asymptotic frameworks. A key step on the demonstration of this property is to consider adaptive weights based on a initial root n-consistent estimator. In practice this implies the usage of a non penalized estimator that limits the adaptive solutions to low dimensional scenarios. In this work, several solutions, based on dimension reduction techniques PCA and PLS, are studied for the calculation of these weights in high dimensional frameworks. The benefits of this proposal are studied both in synthetic and real datasets.We appreciate the work of the referees that has contributed to substantially improve the scientific contributions of this work. In this research we have made use of Uranus, a supercomputer cluster located at University Carlos III of Madrid and funded jointly by EU-FEDER funds and by the Spanish Government via the National Projects No. UNC313-4E-2361, No. ENE2009-12213- C03-03, No. ENE2012-33219 and No. ENE2015-68265-P. This research was partially supported by research grants and Project ECO2015-66593-P from Ministerio de Economia, Industria y Competitividad, Project MTM2017-88708-P from Ministerio de Economia y Competitividad, FEDER funds and Project IJCI-2017-34038 from Agencia Estatal de Investigacion, Ministerio de Ciencia, Innovacion y Universidades.Mendez-Civieta, A.; Aguilera-Morillo, MC.; Lillo, RE. (2021). Adaptive sparse group LASSO in quantile regression. Advances in Data Analysis and Classification. 15:547-573. https://doi.org/10.1007/s11634-020-00413-8S54757315Chatterjee S, Banerjee, Arindam S, Ganguly AR (2011) Sparse Group Lasso for regression on land climate variables. In: IEEE 11th international conference on data mining workshops. IEEE, pp 1–8Chiang AP, Beck JS, Yen H-J, Tayeh MK, Scheetz TE, Swiderski RE, Nishimura DY, Braun TA, Kim K-YA, Huang J, Elbedour K, Carmi R, Slusarski DC, Casavant TL, Stone EM, Sheffield VC (2006) Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proc Natl Acad Sci 103(16):6287–6292Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B Stat Methodol 72(1):3–25Ciuperca G (2017) Adaptive fused LASSO in grouped quantile regression. J Stat Theory Pract 11(1):107–125Ciuperca G (2019) Adaptive group LASSO selection in quantile models. Stat Pap 60(1):173–197Diamond S, Boyd S (2016) CVXPY: a Python-embedded modeling language for convex optimization. arXiv:1603.00943Domahidi A, Chu E, Boyd S (2013) ECOS: an SOCP solver for embedded systems. In: European control conference (ECC)Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3):928–961Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso, pp 1–8. ArXiv:1001.0736Ghosh S (2011) On the grouped selection and model complexity of the adaptive elastic net. Stat Comput 21:451–462Huang J, Horowitz JL, Ma S (2008a) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat 36(2):587–613Huang J, Ma S, Zhang C-H (2008b) Adaptive Lasso for sparse high-dimensional regression. Stat Sin 1(374):1–28Huber PJ, Ronchetti EM (2009) Robust statistics. Wiley series in probability and statistics, 2nd edn. Wiley, HobokenKim Y, Choi H, Oh HS (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103(484):1665–1673Koenker R (2005) Quantile regression. Cambridge University Press, CambridgeKoenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50Laria JC, Aguilera-Morillo MC, Lillo RE (2019) An iterative sparse-group Lasso. J Comput Graph Stat 28:722–731Li Y, Zhu J (2008) L1_1-Norm quantile regression. J Comput Graph Stat 17(1):1–23Loh PL (2017) Statistical consistency and asymptotic normality for high-dimensional robust m-estimators. Ann Stat 45(2):866–896Nardi Y, Rinaldo A (2008) On the asymptotic properties of the group lasso estimator for linear models. Electron J Stat 2:605–633Poignard B (2018) Asymptotic theory of the adaptive Sparse Group Lasso. Ann Inst Stat Math 72:297–328Scheetz TE, Kim K-YA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103(39):14429–14434Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102(43):15545–15550Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19(2):801–817Yahya Algamal Z, Hisyam Lee M (2019) A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Adv Data Anal Classif 13:753–771Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Methodol) 68(1):49–67Zhao W, Zhang R, Liu J (2014) Sparse group variable selection based on quantile hierarchical Lasso. J Appl Stat 41(8):1658–1677Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. Stat Interface 3:557–574Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–28

    Penalized function-on-function partial leastsquares regression

    Get PDF
    This paper deals with the "function-on-function'" or "fully functional" linear regression problem. We address the problem by proposing a novel penalized Function-on-Function Partial Least-Squares (pFFPLS) approach that imposes smoothness on the PLS weights. Our proposal introduces an appropriate finite-dimensional functional space with an associated set of bases on which to represent the data and controls smoothness with a roughness penalty operator. Penalizing the PLS weights imposes smoothness on the resulting coefficient function, improving its interpretability. In a simulation study, we demonstrate the advantages of pFFPLS compared to non-penalized FFPLS. Our comparisons indicate a higher accuracy of pFFPLS when predicting the response and estimating the true coefficient function from which the data were generated. We also illustrate the advantages of our proposal with two case studies involving two well-known datasets from the functional data analysis literature. In the first one, we predict log precipitation curves from the yearly temperature profiles recorded in 35 weather stations in Canada. In the second case study, we predict the hip angle profiles during a gait cycle of children from their corresponding knee angle profiles

    Linear-Phase-Type probability modelling of functional PCA with applications to resistive memories

    Full text link
    [EN] Functional principal component analysis (FPCA) based on Karhunen-Loeve (K-L) expansion allows to describe the stochastic evolution of the main characteristics associated to multiple systems and devices. Identifying the probability distribution of the principal component scores is fundamental to characterize the whole process. The aim of this work is to consider a family of statistical distributions that could be accurately adjusted to a previous transformation. Then, a new class of distributions, the linear-phase-type, is introduced to model the principal components. This class is studied in detail in order to prove, through the K-L expansion, that certain linear transformations of the process at each time point are phase-type distributed. This way, the one-dimensional distributions of the process are in the same linear-phase-type class. Finally, an application to model the reset process associated with resistive memories is developed and explained. (C) 2020 Published by Elsevier B.V. on behalf of International Association for Mathematics and Computers in Simulation (IMACS).We would like to thank F. Campabadal and M.B. Gonzalez from the IMB-CNM (CSIC) in Barcelona for fabricating and providing the experimental measurements of the devices employed here. We acknowledge the support of the Spanish Ministry of Science, Innovation and Universities under projects TEC2017-84321-C4-3-R, MTM201788708-P, IJCI-2017-34038 (also supported by the FEDER, Spain program) and the PhD grant, Spain (FPU18/01779) awarded to Christian Acal. This work has made use of the Spanish ICTS Network MICRONANOFABSRuiz-Castro, JE.; Acal, C.; Aguilera, AM.; Aguilera-Morillo, MC.; Roldán, JB. (2021). Linear-Phase-Type probability modelling of functional PCA with applications to resistive memories. Mathematics and Computers in Simulation. 186:71-79. https://doi.org/10.1016/j.matcom.2020.07.006717918

    Functional modeling of high-dimensional data: a Manifold Learning approach

    Get PDF
    This article belongs to the Special Issue Methodological and Applied Contributions on Stochastic Modelling and ForecastingThis paper introduces stringing via Manifold Learning (ML-stringing), an alternative to the original stringing based on Unidimensional Scaling (UDS). Our proposal is framed within a wider class of methods that map high-dimensional observations to the infinite space of functions,allowing the use of Functional Data Analysis (FDA). Stringing handles general high-dimensional data as scrambled realizations of an unknown stochastic process. Therefore, the essential feature of the method is a rearrangement of the observed values. Motivated by the linear nature of UDS and the increasing number of applications to biosciences (e.g., functional modeling of gene expression arrays and single nucleotide polymorphisms, or the classification of neuroimages) we aim to recover more complex relations between predictors through ML. In simulation studies, it is shown that MLstringing achieves higher-quality orderings and that, in general, this leads to improvements in the functional representation and modeling of the data. The versatility of our method is also illustrated with an application to a colon cancer study that deals with high-dimensional gene expression arrays.This paper shows that ML-stringing is a feasible alternative to the UDS-based version. Also, it opens a window to new contributions to the field of FDA and the study of high-dimensional data.This research was funded in part by Ministerio de Ciencia, Innovación y Universidades grant numbers PID2019-104901RB-I00 and MTM2017-88708-P

    Stepwise selection of functional covariates in forecasting peak levels of olive pollen

    Get PDF
    High levels of airborne olive pollen represent a problem for a large proportion of the population because of the many allergies it causes. Many attempts have been made to forecast the concentration of airborne olive pollen, using methods such as time series, linear regression, neural networks, a combination of fuzzy systems and neural networks, and functional models. This paper presents a functional logistic regression model used to study the relationship between olive pollen concentration and different climatic factors, and on this basis to predict the probability of high (and possibly extreme) levels of airborne pollen, selecting the best subset of functional climatic variables by means of a stepwise method based on the conditional likelihood ratio test.Projects MTM2010-20502 from Dirección General de Investigación del MEC, Spain and FQM-307 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spai

    Dendritic cell deficiencies persist seven months after SARS-CoV-2 infection

    Get PDF
    Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV)-2 infection induces an exacerbated inflammation driven by innate immunity components. Dendritic cells (DCs) play a key role in the defense against viral infections, for instance plasmacytoid DCs (pDCs), have the capacity to produce vast amounts of interferon-alpha (IFN-α). In COVID-19 there is a deficit in DC numbers and IFN-α production, which has been associated with disease severity. In this work, we described that in addition to the DC deficiency, several DC activation and homing markers were altered in acute COVID-19 patients, which were associated with multiple inflammatory markers. Remarkably, previously hospitalized and nonhospitalized patients remained with decreased numbers of CD1c+ myeloid DCs and pDCs seven months after SARS-CoV-2 infection. Moreover, the expression of DC markers such as CD86 and CD4 were only restored in previously nonhospitalized patients, while no restoration of integrin β7 and indoleamine 2,3-dyoxigenase (IDO) levels were observed. These findings contribute to a better understanding of the immunological sequelae of COVID-19

    SARS-CoV-2 viral load in nasopharyngeal swabs is not an independent predictor of unfavorable outcome

    Get PDF
    The aim was to assess the ability of nasopharyngeal SARS-CoV-2 viral load at first patient’s hospital evaluation to predict unfavorable outcomes. We conducted a prospective cohort study including 321 adult patients with confirmed COVID-19 through RT-PCR in nasopharyngeal swabs. Quantitative Synthetic SARS-CoV-2 RNA cycle threshold values were used to calculate the viral load in log10 copies/mL. Disease severity at the end of follow up was categorized into mild, moderate, and severe. Primary endpoint was a composite of intensive care unit (ICU) admission and/or death (n = 85, 26.4%). Univariable and multivariable logistic regression analyses were performed. Nasopharyngeal SARS-CoV-2 viral load over the second quartile (≥ 7.35 log10 copies/mL, p = 0.003) and second tertile (≥ 8.27 log10 copies/mL, p = 0.01) were associated to unfavorable outcome in the unadjusted logistic regression analysis. However, in the final multivariable analysis, viral load was not independently associated with an unfavorable outcome. Five predictors were independently associated with increased odds of ICU admission and/or death: age ≥ 70 years, SpO2, neutrophils > 7.5 × 103/µL, lactate dehydrogenase ≥ 300 U/L, and C-reactive protein ≥ 100 mg/L. In summary, nasopharyngeal SARS-CoV-2 viral load on admission is generally high in patients with COVID-19, regardless of illness severity, but it cannot be used as an independent predictor of unfavorable clinical outcome
    corecore