48,371 research outputs found

    Distributed Basis Pursuit

    Full text link
    We propose a distributed algorithm for solving the optimization problem Basis Pursuit (BP). BP finds the least L1-norm solution of the underdetermined linear system Ax = b and is used, for example, in compressed sensing for reconstruction. Our algorithm solves BP on a distributed platform such as a sensor network, and is designed to minimize the communication between nodes. The algorithm only requires the network to be connected, has no notion of a central processing node, and no node has access to the entire matrix A at any time. We consider two scenarios in which either the columns or the rows of A are distributed among the compute nodes. Our algorithm, named D-ADMM, is a decentralized implementation of the alternating direction method of multipliers. We show through numerical simulation that our algorithm requires considerably less communications between the nodes than the state-of-the-art algorithms.Comment: Preprint of the journal version of the paper; IEEE Transactions on Signal Processing, Vol. 60, Issue 4, April, 201

    Loop optimization for tensor network renormalization

    Get PDF
    We introduce a tensor renormalization group scheme for coarse-graining a two-dimensional tensor network that can be successfully applied to both classical and quantum systems on and off criticality. The key innovation in our scheme is to deform a 2D tensor network into small loops and then optimize the tensors on each loop. In this way, we remove short-range entanglement at each iteration step and significantly improve the accuracy and stability of the renormalization flow. We demonstrate our algorithm in the classical Ising model and a frustrated 2D quantum model.Comment: 15 pages, 11 figures, accepted version for Phys. Rev. Let

    Data-efficient learning of feedback policies from image pixels using deep dynamical models

    Get PDF
    Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques

    A Quality and Cost Approach for Comparison of Small-World Networks

    Full text link
    We propose an approach based on analysis of cost-quality tradeoffs for comparison of efficiency of various algorithms for small-world network construction. A number of both known in the literature and original algorithms for complex small-world networks construction are shortly reviewed and compared. The networks constructed on the basis of these algorithms have basic structure of 1D regular lattice with additional shortcuts providing the small-world properties. It is shown that networks proposed in this work have the best cost-quality ratio in the considered class.Comment: 27 pages, 16 figures, 1 tabl

    Maximizing CNN Accelerator Efficiency Through Resource Partitioning

    Full text link
    Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x
    • …
    corecore