1,814 research outputs found

    The Lasso for High-Dimensional Regression with a Possible Change-Point

    Full text link
    We consider a high-dimensional regression model with a possible change-point due to a covariate threshold and develop the Lasso estimator of regression coefficients as well as the threshold parameter. Our Lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the ā„“1\ell_1 estimation loss for regression coefficients. Since the Lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a nearly nāˆ’1n^{-1} factor even when the number of regressors can be much larger than the sample size (nn). We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data

    Factor-Driven Two-Regime Regression

    Full text link
    We propose a novel two-regime regression model where regime switching is driven by a vector of possibly unobservable factors. When the factors are latent, we estimate them by the principal component analysis of a panel data set. We show that the optimization problem can be reformulated as mixed integer optimization, and we present two alternative computational algorithms. We derive the asymptotic distribution of the resulting estimator under the scheme that the threshold effect shrinks to zero. In particular, we establish a phase transition that describes the effect of first-stage factor estimation as the cross-sectional dimension of panel data increases relative to the time-series dimension. Moreover, we develop bootstrap inference and illustrate our methods via numerical studies

    Testing for threshold effects in regression models

    Get PDF
    In this article, we develop a general method for testing threshold effects in regression models, using sup-likelihood-ratio (LR)-type statistics. Although the sup-LR-type test statistic has been considered in the literature, our method for establishing the asymptotic null distribution is new and nonstandard. The standard approach in the literature for obtaining the asymptotic null distribution requires that there exist a certain quadratic approximation to the objective function. The article provides an alternative, novel method that can be used to establish the asymptotic null distribution, even when the usual quadratic approximation is intractable. We illustrate the usefulness of our approach in the examples of the maximum score estimation, maximum likelihood estimation, quantile regression, and maximum rank correlation estimation. We establish consistency and local power properties of the test. We provide some simulation results and also an empirical application to tipping in racial segregation. This article has supplementary materials online.

    Response to Letter to Editor

    Get PDF

    Fast Inference for Quantile Regression with Tens of Millions of Observations

    Full text link
    Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear quantile regression applied to ``ultra-large'' datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic sub-gradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming "new observation", (ii) aggregating it as a Polyak-Ruppert average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as (n,d)āˆ¼(107,103)(n, d) \sim (10^7, 10^3), where nn is the sample size and dd is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over 10310^3 covariates to mitigate confounding effects.Comment: 45 pages, 6 figure

    Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

    Full text link
    We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem. Our proposed inference method has a couple of key advantages over the existing methods. First, the test statistic is computed in an online fashion with only SGD iterates and the critical values can be obtained without any resampling methods, thereby allowing for efficient implementation suitable for massive online data. Second, there is no need to estimate the asymptotic variance and our inference method is shown to be robust to changes in the tuning parameters for SGD algorithms in simulation experiments with synthetic data.Comment: 16 pages, 5 figures, 5 table

    HMGB1, a potential regulator of tumor microenvironment in KSHV-infected endothelial cells

    Get PDF
    High-mobility group box 1 (HMGB1) is a protein that binds to DNA and participates in various cellular processes, including DNA repair, transcription, and inflammation. It is also associated with cancer progression and therapeutic resistance. Despite its known role in promoting tumor growth and immune evasion in the tumor microenvironment, the contribution of HMGB1 to the development of Kaposiā€™s sarcoma (KS) is not well understood. We investigated the effect of HMGB1 on KS pathogenesis using immortalized human endothelial cells infected with Kaposiā€™s sarcoma-associated human herpes virus (KSHV). Our results showed that a higher amount of HMGB1 was detected in the supernatant of KSHV-infected cells compared to that of mock-infected cells, indicating that KSHV infection induced the secretion of HMGB1 in human endothelial cells. By generating HMGB1 knockout clones from immortalized human endothelial cells using CRISPR/Cas9, we elucidated the role of HMGB1 in KSHV-infected endothelial cells. Our findings indicate that the absence of HMGB1 did not induce lytic replication in KSHV-infected cells, but the cell viability of KSHV-infected cells was decreased in both 2D and 3D cultures. Through the antibody array for cytokines and growth factors, CXCL5, PDGF-AA, G-CSF, Emmprin, IL-17A, and VEGF were found to be suppressed in HMGB1 KO KSHV-infected cells compared to the KSHV-infected wild-type control. Mechanistically, phosphorylation of p38 would be associated with transcriptional regulation of CXCL5, PDGF-A and VEGF. These observations suggest that HMGB1 may play a critical role in KS pathogenesis by regulating cytokine and growth factor secretion and emphasize its potential as a therapeutic target for KS by modulating the tumor microenvironment
    • ā€¦
    corecore