4,276 research outputs found
Factoring nonnegative matrices with linear programs
This paper describes a new approach, based on linear programming, for
computing nonnegative matrix factorizations (NMFs). The key idea is a
data-driven model for the factorization where the most salient features in the
data are used to express the remaining features. More precisely, given a data
matrix X, the algorithm identifies a matrix C such that X approximately equals
CX and some linear constraints. The constraints are chosen to ensure that the
matrix C selects features; these features can then be used to find a low-rank
NMF of X. A theoretical analysis demonstrates that this approach has guarantees
similar to those of the recent NMF algorithm of Arora et al. (2012). In
contrast with this earlier work, the proposed method extends to more general
noise models and leads to efficient, scalable algorithms. Experiments with
synthetic and real datasets provide evidence that the new approach is also
superior in practice. An optimized C++ implementation can factor a
multigigabyte matrix in a matter of minutes.Comment: 17 pages, 10 figures. Modified theorem statement for robust recovery
conditions. Revised proof techniques to make arguments more elementary.
Results on robustness when rows are duplicated have been superseded by
arxiv.org/1211.668
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve
state-of-the-art performance on a variety of machine learning tasks. Several
researchers have recently proposed schemes to parallelize SGD, but all require
performance-destroying memory locking and synchronization. This work aims to
show using novel theoretical analysis, algorithms, and implementation that SGD
can be implemented without any locking. We present an update scheme called
HOGWILD! which allows processors access to shared memory with the possibility
of overwriting each other's work. We show that when the associated optimization
problem is sparse, meaning most gradient updates only modify small parts of the
decision variable, then HOGWILD! achieves a nearly optimal rate of convergence.
We demonstrate experimentally that HOGWILD! outperforms alternative schemes
that use locking by an order of magnitude.Comment: 22 pages, 10 figure
Lineage-specific interface proteins match up the cell cycle and differentiation in embryo stem cells.
The shortage of molecular information on cell cycle changes along embryonic stem cell (ESC) differentiation prompts an in silico approach, which may provide a novel way to identify candidate genes or mechanisms acting in coordinating the two programs. We analyzed germ layer specific gene expression changes during the cell cycle and ESC differentiation by combining four human cell cycle transcriptome profiles with thirteen in vitro human ESC differentiation studies. To detect cross-talk mechanisms we then integrated the transcriptome data that displayed differential regulation with protein interaction data. A new class of non-transcriptionally regulated genes was identified, encoding proteins which interact systematically with proteins corresponding to genes regulated during the cell cycle or cell differentiation, and which therefore can be seen as interface proteins coordinating the two programs. Functional analysis gathered insights in fate-specific candidates of interface functionalities. The non-transcriptionally regulated interface proteins were found to be highly regulated by post-translational ubiquitylation modification, which may synchronize the transition between cell proliferation and differentiation in ESCs
- …