Search CORE

7 research outputs found

Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Author: Chakrabortty Abhishek
Kuchibhotla Arun Kumar
Publication venue
Publication date: 05/08/2020
Field of study

Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much weaker exponential type (namely sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors. We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in HD statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum k-sub-matrix operator norm which are key quantities of interest in bootstrap, HD covariance matrix estimation and HD inference. The third example concerns the restricted eigenvalue condition, required in HD linear regression, which we verify for all sub-Weibull random vectors through a unified analysis, and also prove a more general result related to restricted strong convexity in the process. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence under much weaker than usual tail assumptions (on the errors as well as the covariates), while also allowing for misspecified models and both fixed and random design. To our knowledge, these are the first such results for Lasso obtained in this generality. The common feature in all our results over all the examples is that the convergence rates under most exponential tails match the usual ones under sub-Gaussian assumptions.Comment: 64 pages; Revised version (discussions added and some results modified in Section 4, minor changes made throughout

arXiv.org e-Print Archive

A Poisson regression model for association mapping of count phenotypes

Author: Chakrabortty Abhishek
Ghosh Saurabh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Harvard University - DASH

Springer - Publisher Connector

Recommended from our members

Robust Semi-Parametric Inference in Semi-Supervised Settings

Author: Chakrabortty Abhishek
Publication venue: 'Harvard University Botany Libraries'
Publication date: 25/07/2017
Field of study

In this dissertation, we consider semi-parametric estimation problems under semi-supervised (SS) settings, wherein the available data consists of a small or moderate sized labeled data (L), and a much larger unlabeled data (U). Such data arises naturally from settings where the outcome, unlike the covariates, is expensive to obtain, a frequent scenario in modern studies involving large electronic databases. It is often of interest in SS settings to investigate if and when U can be exploited to improve estimation efficiency, compared to supervised estimators based on L only. In Chapter 1, we propose a class of Efficient and Adaptive Semi-Supervised Estimators (EASE) for linear regression. These are semi-non-parametric imputation based two-step estimators adaptive to model mis-specification, leading to improved efficiency under model mis-specification, and equal (optimal) efficiency when the linear model holds. This adaptive property is crucial for advocating safe use of U. We provide asymptotic results establishing our claims, followed by simulations and application to real data. In Chapter 2, we provide a unified framework for SS M-estimation problems based on general estimating equations, and propose a family of EASE estimators that are always as efficient as the supervised estimator and more efficient whenever U is actually informative for the parameter of interest. For a subclass of problems, we also provide a flexible semi-non-parametric imputation strategy for constructing EASE. We provide asymptotic results establishing our claims, followed by simulations and application to real data. In Chapter 3, we consider regressing a binary outcome (Y) on some covariates (X) based on a large unlabeled data with observations only for X, and additionally, a surrogate (S) which can predict Y with high accuracy when it assumes extreme values. Assuming Y and S both follow single index models versus X, we show that under sparsity assumptions, we can recover the regression parameter of Y versus X through a least squares LASSO estimator based on the subset of the data restricted to the extreme sets of S with Y imputed using the surrogacy of S. We provide sharp finite sample performance guarantees for our estimator, followed by simulations and application to real data.Biostatistic

Harvard University - DASH

Semi-supervised estimation of covariance with application to phenome-wide association studies with electronic medical records data

Author: Cai Tianxi
Chakrabortty Abhishek
Chan Stephanie,
Hejblum Boris P.
Publication venue: SAGE Publications
Publication date
Field of study

International audienc

Surrogate-assisted feature extraction for high-throughput phenotyping

Author: Abhishek Chakrabortty
Ananthakrishnan
Ashwin N Ananthakrishnan
Bejan
Benesch
Birman-Deych
Carroll
Carroll
Castro
Castro
Castro
Conway
Delude
Denny
Denny
Denny
Douglas
Fawcett
Hastie
HITEx Manual
Humphreys
Isaac S Kohane
Katherine P Liao
Kohane
Kumar
L. Masica
Liao
Liao
Liao
Liao
Love
McCarty
Pakhomov
Pantalone
Pantalone
Pathak
Peter Szolovits
Ritchie
Ryan
Shawn N Murphy
Sheng Yu
Stakic
Susanne E Churchill
Tatonetti
Tianrun Cai
Tianxi Cai
Vivian S Gainer
White
Wu
Xia
Yu
Yu
Yu
Zhan
Zou
Zou
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Probing non-holomorphic MSSM via precision constraints, dark matter and LHC data

Author: A Arbey
A Nyffeler
A Sabanci
A Sirlin
Abhishek Dey
AH Chamseddine
B Bhattacherjee
B Bhattacherjee
B Dutta
BL Roberts
C Boehm
D Feldman
DM Ghilencea
DM Ghilencea
ES Abers
F Jegerlehner
F Staub
F Staub
F Staub
F Staub
G Bélanger
G Bélanger
G Bélanger
G-C Cho
GG Ross
H Baer
H Baer
H Baer
HE Haber
HE Haber
HE Haber
HP Nilles
J Chakrabortty
JR Ellis
K Garrett
K Kowalska
K Kowalska
L Bergstrom
L Girardello
LJ Hall
M Cahill-Rowley
M Chakraborti
M Chakraborti
M Chakraborti
M Chakraborti
M Drees
M Endo
M Endo
ME Cabrera
MW Cahill-Rowley
MW Cahill-Rowley
N Chen
N Ohta
P Draper
P Nath
P Nath
R Barbieri
R Barbieri
S Akula
S Alekhin
S Heinemeyer
SP Das
SP Martin
U Chattopadhyay
U Ellwanger
U Haisch
Utpal Chattopadhyay
W Porod
W Porod
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref