Search CORE

45 research outputs found

Statistical Properties of Convex Clustering

Author: Tan Kean Ming
Witten Daniela
Publication venue
Publication date: 01/01/2015
Field of study

In this manuscript, we study the statistical properties of convex clustering. We establish that convex clustering is closely related to single linkage hierarchical clustering and

k

-means clustering. In addition, we derive the range of tuning parameter for convex clustering that yields a non-trivial solution. We also provide an unbiased estimate of the degrees of freedom, and provide a finite sample bound for the prediction error for convex clustering. We compare convex clustering to some traditional clustering methods in simulation studies.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Selection Bias Correction and Effect Size Estimation under Dependence

Author: Simon Noah
Tan Kean Ming
Witten Daniela
Publication venue
Publication date: 28/03/2015
Field of study

We consider large-scale studies in which it is of interest to test a very large number of hypotheses, and then to estimate the effect sizes corresponding to the rejected hypotheses. For instance, this setting arises in the analysis of gene expression or DNA sequencing data. However, naive estimates of the effect sizes suffer from selection bias, i.e., some of the largest naive estimates are large due to chance alone. Many authors have proposed methods to reduce the effects of selection bias under the assumption that the naive estimates of the effect sizes are independent. Unfortunately, when the effect size estimates are dependent, these existing techniques can have very poor performance, and in practice there will often be dependence. We propose an estimator that adjusts for selection bias under a recently-proposed frequentist framework, without the independence assumption. We study some properties of the proposed estimator, and illustrate that it outperforms past proposals in a simulation study and on two gene expression data sets.Comment: 21 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

High-dimensional Inference for Generalized Linear Models with Hidden Confounding

Author: Ouyang Jing
Tan Kean Ming
Xu Gongjun
Publication venue
Publication date: 07/09/2022
Field of study

Statistical inferences for high-dimensional regression models have been extensively studied for their wide applications ranging from genomics, neuroscience, to economics. In practice, there are often potential unmeasured confounders associated with both the response and covariates, leading to the invalidity of the standard debiasing methods. This paper focuses on a generalized linear regression framework with hidden confounding and proposes a debiasing approach to address this high-dimensional problem by adjusting for effects induced by the unmeasured confounders. We establish consistency and asymptotic normality for the proposed debiased estimator. The finite sample performance of the proposed method is demonstrated via extensive numerical studies and an application to a genetic dataset

arXiv.org e-Print Archive