833 research outputs found
Clustering of Data with Missing Entries
The analysis of large datasets is often complicated by the presence of
missing entries, mainly because most of the current machine learning algorithms
are designed to work with full data. The main focus of this work is to
introduce a clustering algorithm, that will provide good clustering even in the
presence of missing data. The proposed technique solves an fusion
penalty based optimization problem to recover the clusters. We theoretically
analyze the conditions needed for the successful recovery of the clusters. We
also propose an algorithm to solve a relaxation of this problem using
saturating non-convex fusion penalties. The method is demonstrated on simulated
and real datasets, and is observed to perform well in the presence of large
fractions of missing entries.Comment: arXiv admin note: substantial text overlap with arXiv:1709.0187
Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models
We consider varying-coefficient models with categorial effect modifiers in the framework of generalized linear models. We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor. We investigate the estimators’ large sample properties, and show in simulation studies that the proposed approaches perform very well for finite samples, too. Furthermore, the presented methods are compared with alternative procedures, and applied to real-world medical data
Dynamic Tensor Clustering
Dynamic tensor data are becoming prevalent in numerous applications. Existing
tensor clustering methods either fail to account for the dynamic nature of the
data, or are inapplicable to a general-order tensor. Also there is often a gap
between statistical guarantee and computational efficiency for existing tensor
clustering solutions. In this article, we aim to bridge this gap by proposing a
new dynamic tensor clustering method, which takes into account both sparsity
and fusion structures, and enjoys strong statistical guarantees as well as high
computational efficiency. Our proposal is based upon a new structured tensor
factorization that encourages both sparsity and smoothness in parameters along
the specified tensor modes. Computationally, we develop a highly efficient
optimization algorithm that benefits from substantial dimension reduction. In
theory, we first establish a non-asymptotic error bound for the estimator from
the structured tensor factorization. Built upon this error bound, we then
derive the rate of convergence of the estimated cluster centers, and show that
the estimated clusters recover the true cluster structures with a high
probability. Moreover, our proposed method can be naturally extended to
co-clustering of multiple modes of the tensor data. The efficacy of our
approach is illustrated via simulations and a brain dynamic functional
connectivity analysis from an Autism spectrum disorder study.Comment: Accepted at Journal of the American Statistical Associatio
- …