308 research outputs found
TriSig: Assessing the statistical significance of triclusters
Tensor data analysis allows researchers to uncover novel patterns and
relationships that cannot be obtained from matrix data alone. The information
inferred from the patterns provides valuable insights into disease progression,
bioproduction processes, weather fluctuations, and group dynamics. However,
spurious and redundant patterns hamper this process. This work aims at
proposing a statistical frame to assess the probability of patterns in tensor
data to deviate from null expectations, extending well-established principles
for assessing the statistical significance of patterns in matrix data. A
comprehensive discussion on binomial testing for false positive discoveries is
entailed at the light of: variable dependencies, temporal dependencies and
misalignments, and \textit{p}-value corrections under the Benjamini-Hochberg
procedure. Results gathered from the application of state-of-the-art
triclustering algorithms over distinct real-world case studies in biochemical
and biotechnological domains confer validity to the proposed statistical frame
while revealing vulnerabilities of some triclustering searches. The proposed
assessment can be incorporated into existing triclustering algorithms to
mitigate false positive/spurious discoveries and further prune the search
space, reducing their computational complexity.
Availability: The code is freely available at
https://github.com/JupitersMight/TriSig under the MIT license
Dynamic Tensor Clustering
Dynamic tensor data are becoming prevalent in numerous applications. Existing
tensor clustering methods either fail to account for the dynamic nature of the
data, or are inapplicable to a general-order tensor. Also there is often a gap
between statistical guarantee and computational efficiency for existing tensor
clustering solutions. In this article, we aim to bridge this gap by proposing a
new dynamic tensor clustering method, which takes into account both sparsity
and fusion structures, and enjoys strong statistical guarantees as well as high
computational efficiency. Our proposal is based upon a new structured tensor
factorization that encourages both sparsity and smoothness in parameters along
the specified tensor modes. Computationally, we develop a highly efficient
optimization algorithm that benefits from substantial dimension reduction. In
theory, we first establish a non-asymptotic error bound for the estimator from
the structured tensor factorization. Built upon this error bound, we then
derive the rate of convergence of the estimated cluster centers, and show that
the estimated clusters recover the true cluster structures with a high
probability. Moreover, our proposed method can be naturally extended to
co-clustering of multiple modes of the tensor data. The efficacy of our
approach is illustrated via simulations and a brain dynamic functional
connectivity analysis from an Autism spectrum disorder study.Comment: Accepted at Journal of the American Statistical Associatio
- …