48,371 research outputs found
Distributed Basis Pursuit
We propose a distributed algorithm for solving the optimization problem Basis
Pursuit (BP). BP finds the least L1-norm solution of the underdetermined linear
system Ax = b and is used, for example, in compressed sensing for
reconstruction. Our algorithm solves BP on a distributed platform such as a
sensor network, and is designed to minimize the communication between nodes.
The algorithm only requires the network to be connected, has no notion of a
central processing node, and no node has access to the entire matrix A at any
time. We consider two scenarios in which either the columns or the rows of A
are distributed among the compute nodes. Our algorithm, named D-ADMM, is a
decentralized implementation of the alternating direction method of
multipliers. We show through numerical simulation that our algorithm requires
considerably less communications between the nodes than the state-of-the-art
algorithms.Comment: Preprint of the journal version of the paper; IEEE Transactions on
Signal Processing, Vol. 60, Issue 4, April, 201
Loop optimization for tensor network renormalization
We introduce a tensor renormalization group scheme for coarse-graining a
two-dimensional tensor network that can be successfully applied to both
classical and quantum systems on and off criticality. The key innovation in our
scheme is to deform a 2D tensor network into small loops and then optimize the
tensors on each loop. In this way, we remove short-range entanglement at each
iteration step and significantly improve the accuracy and stability of the
renormalization flow. We demonstrate our algorithm in the classical Ising model
and a frustrated 2D quantum model.Comment: 15 pages, 11 figures, accepted version for Phys. Rev. Let
Data-efficient learning of feedback policies from image pixels using deep dynamical models
Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques
A Quality and Cost Approach for Comparison of Small-World Networks
We propose an approach based on analysis of cost-quality tradeoffs for
comparison of efficiency of various algorithms for small-world network
construction. A number of both known in the literature and original algorithms
for complex small-world networks construction are shortly reviewed and
compared. The networks constructed on the basis of these algorithms have basic
structure of 1D regular lattice with additional shortcuts providing the
small-world properties. It is shown that networks proposed in this work have
the best cost-quality ratio in the considered class.Comment: 27 pages, 16 figures, 1 tabl
Maximizing CNN Accelerator Efficiency Through Resource Partitioning
Convolutional neural networks (CNNs) are revolutionizing machine learning,
but they present significant computational challenges. Recently, many
FPGA-based accelerators have been proposed to improve the performance and
efficiency of CNNs. Current approaches construct a single processor that
computes the CNN layers one at a time; the processor is optimized to maximize
the throughput at which the collection of layers is computed. However, this
approach leads to inefficient designs because the same processor structure is
used to compute CNN layers of radically varying dimensions.
We present a new CNN accelerator paradigm and an accompanying automated
design methodology that partitions the available FPGA resources into multiple
processors, each of which is tailored for a different subset of the CNN
convolutional layers. Using the same FPGA resources as a single large
processor, multiple smaller specialized processors increase computational
efficiency and lead to a higher overall throughput. Our design methodology
achieves 3.8x higher throughput than the state-of-the-art approach on
evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more
recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x
- …