10,677 research outputs found

    Variable selection via Lasso with high-dimensional proteomic data

    Get PDF
    Multiclass classification with high-dimensional data is an applied topic both in statistics and machine learning. The classification procedure could be done in various ways. In this thesis, we review the theory of the Lasso procedure which provides a parameter estimator while simultaneously achieving dimension reduction due to a property of the L1 norm. Lasso with elastic net penalty and sparse group lasso are also reviewed. Our data is high-dimensional proteomic data (iTRAQ ratios) of breast cancer patients with four subtypes of breast cancer. We use the multinomial logistic regression to train our classifier and use the false classification rates obtained from cross validation to compare models

    Adaptive sparse group LASSO in quantile regression

    Full text link
    [EN] This paper studies the introduction of sparse group LASSO (SGL) to the quantile regression framework. Additionally, a more flexible version, an adaptive SGL is proposed based on the adaptive idea, this is, the usage of adaptive weights in the penalization. Adaptive estimators are usually focused on the study of the oracle property under asymptotic and double asymptotic frameworks. A key step on the demonstration of this property is to consider adaptive weights based on a initial root n-consistent estimator. In practice this implies the usage of a non penalized estimator that limits the adaptive solutions to low dimensional scenarios. In this work, several solutions, based on dimension reduction techniques PCA and PLS, are studied for the calculation of these weights in high dimensional frameworks. The benefits of this proposal are studied both in synthetic and real datasets.We appreciate the work of the referees that has contributed to substantially improve the scientific contributions of this work. In this research we have made use of Uranus, a supercomputer cluster located at University Carlos III of Madrid and funded jointly by EU-FEDER funds and by the Spanish Government via the National Projects No. UNC313-4E-2361, No. ENE2009-12213- C03-03, No. ENE2012-33219 and No. ENE2015-68265-P. This research was partially supported by research grants and Project ECO2015-66593-P from Ministerio de Economia, Industria y Competitividad, Project MTM2017-88708-P from Ministerio de Economia y Competitividad, FEDER funds and Project IJCI-2017-34038 from Agencia Estatal de Investigacion, Ministerio de Ciencia, Innovacion y Universidades.Mendez-Civieta, A.; Aguilera-Morillo, MC.; Lillo, RE. (2021). Adaptive sparse group LASSO in quantile regression. Advances in Data Analysis and Classification. 15:547-573. https://doi.org/10.1007/s11634-020-00413-8S54757315Chatterjee S, Banerjee, Arindam S, Ganguly AR (2011) Sparse Group Lasso for regression on land climate variables. In: IEEE 11th international conference on data mining workshops. IEEE, pp 1–8Chiang AP, Beck JS, Yen H-J, Tayeh MK, Scheetz TE, Swiderski RE, Nishimura DY, Braun TA, Kim K-YA, Huang J, Elbedour K, Carmi R, Slusarski DC, Casavant TL, Stone EM, Sheffield VC (2006) Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proc Natl Acad Sci 103(16):6287–6292Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B Stat Methodol 72(1):3–25Ciuperca G (2017) Adaptive fused LASSO in grouped quantile regression. J Stat Theory Pract 11(1):107–125Ciuperca G (2019) Adaptive group LASSO selection in quantile models. Stat Pap 60(1):173–197Diamond S, Boyd S (2016) CVXPY: a Python-embedded modeling language for convex optimization. arXiv:1603.00943Domahidi A, Chu E, Boyd S (2013) ECOS: an SOCP solver for embedded systems. In: European control conference (ECC)Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3):928–961Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso, pp 1–8. ArXiv:1001.0736Ghosh S (2011) On the grouped selection and model complexity of the adaptive elastic net. Stat Comput 21:451–462Huang J, Horowitz JL, Ma S (2008a) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat 36(2):587–613Huang J, Ma S, Zhang C-H (2008b) Adaptive Lasso for sparse high-dimensional regression. Stat Sin 1(374):1–28Huber PJ, Ronchetti EM (2009) Robust statistics. Wiley series in probability and statistics, 2nd edn. Wiley, HobokenKim Y, Choi H, Oh HS (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103(484):1665–1673Koenker R (2005) Quantile regression. Cambridge University Press, CambridgeKoenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50Laria JC, Aguilera-Morillo MC, Lillo RE (2019) An iterative sparse-group Lasso. J Comput Graph Stat 28:722–731Li Y, Zhu J (2008) L1_1-Norm quantile regression. J Comput Graph Stat 17(1):1–23Loh PL (2017) Statistical consistency and asymptotic normality for high-dimensional robust m-estimators. Ann Stat 45(2):866–896Nardi Y, Rinaldo A (2008) On the asymptotic properties of the group lasso estimator for linear models. Electron J Stat 2:605–633Poignard B (2018) Asymptotic theory of the adaptive Sparse Group Lasso. Ann Inst Stat Math 72:297–328Scheetz TE, Kim K-YA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103(39):14429–14434Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102(43):15545–15550Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19(2):801–817Yahya Algamal Z, Hisyam Lee M (2019) A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Adv Data Anal Classif 13:753–771Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Methodol) 68(1):49–67Zhao W, Zhang R, Liu J (2014) Sparse group variable selection based on quantile hierarchical Lasso. J Appl Stat 41(8):1658–1677Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. Stat Interface 3:557–574Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–28

    Sparse Linear Models applied to Power Quality Disturbance Classification

    Full text link
    Power quality (PQ) analysis describes the non-pure electric signals that are usually present in electric power systems. The automatic recognition of PQ disturbances can be seen as a pattern recognition problem, in which different types of waveform distortion are differentiated based on their features. Similar to other quasi-stationary signals, PQ disturbances can be decomposed into time-frequency dependent components by using time-frequency or time-scale transforms, also known as dictionaries. These dictionaries are used in the feature extraction step in pattern recognition systems. Short-time Fourier, Wavelets and Stockwell transforms are some of the most common dictionaries used in the PQ community, aiming to achieve a better signal representation. To the best of our knowledge, previous works about PQ disturbance classification have been restricted to the use of one among several available dictionaries. Taking advantage of the theory behind sparse linear models (SLM), we introduce a sparse method for PQ representation, starting from overcomplete dictionaries. In particular, we apply Group Lasso. We employ different types of time-frequency (or time-scale) dictionaries to characterize the PQ disturbances, and evaluate their performance under different pattern recognition algorithms. We show that the SLM reduce the PQ classification complexity promoting sparse basis selection, and improving the classification accuracy
    corecore