20 research outputs found
Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization
The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012)
turns non-negative matrix factorization (NMF) into a tractable problem.
Recently, a new class of provably-correct NMF algorithms have emerged under
this assumption. In this paper, we reformulate the separable NMF problem as
that of finding the extreme rays of the conical hull of a finite set of
vectors. From this geometric perspective, we derive new separable NMF
algorithms that are highly scalable and empirically noise robust, and have
several other favorable properties in relation to existing methods. A parallel
implementation of our algorithm demonstrates high scalability on shared- and
distributed-memory machines.Comment: 15 pages, 6 figure
Statistical tests for large tree-structured data
We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton–Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients
Statistical tests for large tree-structured data
We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton–Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients
Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries
Topologically associating domains (TADs) detected by Hi-C technologies are megabase-scale areas of highly interacting chromatin. Here Gong, Lazaris et al. develop a computational approach to improve the reproducibility of Hi-C contact matrices and stratify TAD boundaries based on their insulating strength