44 research outputs found
Towards the interpretation of time-varying regularization parameters in streaming penalized regression models
High-dimensional, streaming datasets are ubiquitous in modern applications. Examples range from finance and e-commerce to the study of biomedical and neuroimaging data. As a result, many novel algorithms have been proposed to address challenges posed by such datasets. In this work, we focus on the use of â1 regularized linear models in the context of (possibly non-stationary) streaming data. Recently, it has been noted that the choice of the regularization parameter is fundamental in such models and several methods have been proposed which iteratively tune such a parameter in a time-varying manner; thereby allowing the underlying sparsity of estimated models to vary. Moreover, in many applications, inference on the regularization parameter may itself be of interest, as such a parameter is related to the underlying sparsity of the model. However, in this work, we highlight and provide extensive empirical evidence regarding how various (often unrelated) statistical properties in the data can lead to changes in the regularization parameter. In particular, through various synthetic experiments, we demonstrate that changes in the regularization parameter may be driven by changes in the true underlying sparsity, signal-to-noise ratio or even model misspecification. The purpose of this letter is, therefore, to highlight and catalog various statistical properties which induce changes in the associated regularization parameter. We conclude by presenting two applications: one relating to financial data and another to neuroimaging data, where the aforementioned discussion is relevant
Recommended from our members
Mode-Based Classifier: A Robust and Flexible Discriminant Analysis for High-Dimensional Data
This file available on this institutional repository is a preprint. It has not been certified by peer review. It is freely available at http://www3.stat.sinica.edu.tw/ss_newpaper/SS-2023-0014_na.pdf.Supplementary Materials: In the supplementary materials, we present additional results for simulation examples and real data analysis, and provide the technical results of Theorems 1-3.High-dimensional classification is both challenging and of interest in numerous applications.
Componentwise distance-based classifiers, which utilize partial information with known categories,
such as mean, median and quantiles, provide a convenient way. However, when the input features are
heavy-tailed or contain outliers, performance of the centroid classifier can be poor. Beyond that, it
frequently occurs that a population consists of two or more subpopulations, the mean, median and
quantiles in this scenario fail to capture such a structure that can be instead preserved by mode,
which is an appealing measure of considerable significance but might be neglected. This paper thus
introduces and investigates componentwise mode-based classifiers that can reveal important structures
missed by existing distance-based classifiers. We explore several strategies for defining the family of
mode-based classifiers, including the unimodal classifiers, the multimodal classifier and the quantilemode
classifier. The unimodal classifiers are proposed based on componentwise unimodal distance
and kernel mode estimation, and the multimodal classifier is constructed by identifying all the local
modes of a distribution according to a novel introduced algorithm. We establish the asymptotic
properties of these methods and demonstrate through simulation studies and three real datasets that
the mode-based classifiers compare favorably to the current state-of-art methods.The research of W. Xiong was supported in part by NSFC grants 12001101 and the Fundamental Research
Funds for the Central Universities in UIBE CXTD14-05
Recommended from our members
Time Varying Quantile Lasso
In the present paper we study the dynamics of penalization parameter λ of the least absolute shrinkage and selection operator (Lasso) method proposed by Tibshirani (1996) and extended into quantile regression context by Li and Zhu (2008). The dynamic behaviour of the parameter λ can be observed when the model is assumed to vary over time and therefore the fitting is performed with the use of moving windows. The proposal of investigating time series of λ and its dependency on model characteristics was brought into focus by Hardle et al. (2016), which was a foundation of FinancialRiskMeter (http://frm.wiwi.hu-berlin.de). Following the ideas behind the two aforementioned projects, we use the derivation of the formula for the penalization parameter λ as a result of the optimization problem. This reveals three possible effects driving λ; variance of the error term, correlation structure of the covariates and number of nonzero coefficients of the model. Our aim is to disentangle these three effect and investigate their relationship with the tuning parameter λ, which is conducted by a simulation study. After dealing with the theoretical impact of the three model characteristics on λ, empirical application is performed and the idea of implementing the parameter λ into a systemic risk measure is presented. The codes used to obtain the results included in this work are available on http://quantlet.de/d3/ia/
Benthic meltwater fjord habitats formed by rapid glacier recession on King George Island, Antarctica
Recommended from our members
Bayesian Spatiotemporal Modeling for Costs of Alcohol-related Hospital Discharges
Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (21XNH155); National Natural Science Foundation of China (No.11861042); China Statistical Research Project (No.2020LZ25)
Recommended from our members
A Bayesian multi-stage spatio-temporally dependent model for spatial clustering and variable selection
Data availability statement: We use publicly available data and the link to the data source is provided in the paper...National Natural Science Foundation of China (No.11861042), and the China Statistical Research Project (No.2020LZ25)