4,281 research outputs found

    Factoring nonnegative matrices with linear programs

    Get PDF
    This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C such that X approximately equals CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes.Comment: 17 pages, 10 figures. Modified theorem statement for robust recovery conditions. Revised proof techniques to make arguments more elementary. Results on robustness when rows are duplicated have been superseded by arxiv.org/1211.668

    Voting and Vice: Criminal Disenfranchisement and the Reconstruction Amendments

    Get PDF

    HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

    Full text link
    Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.Comment: 22 pages, 10 figure

    Lineage-specific interface proteins match up the cell cycle and differentiation in embryo stem cells.

    Get PDF
    The shortage of molecular information on cell cycle changes along embryonic stem cell (ESC) differentiation prompts an in silico approach, which may provide a novel way to identify candidate genes or mechanisms acting in coordinating the two programs. We analyzed germ layer specific gene expression changes during the cell cycle and ESC differentiation by combining four human cell cycle transcriptome profiles with thirteen in vitro human ESC differentiation studies. To detect cross-talk mechanisms we then integrated the transcriptome data that displayed differential regulation with protein interaction data. A new class of non-transcriptionally regulated genes was identified, encoding proteins which interact systematically with proteins corresponding to genes regulated during the cell cycle or cell differentiation, and which therefore can be seen as interface proteins coordinating the two programs. Functional analysis gathered insights in fate-specific candidates of interface functionalities. The non-transcriptionally regulated interface proteins were found to be highly regulated by post-translational ubiquitylation modification, which may synchronize the transition between cell proliferation and differentiation in ESCs
    • …
    corecore