Search CORE

2,433 research outputs found

Selective optical broadcasting in reconfigurable multiprocessor interconnects - art. no. 61850J

Author: ARTUNDO I
Dambre Joni
DEBAES C
DESMET L
Heirman Wim
Van Campenhout Jan
Publication venue
Publication date: 01/01/2006
Field of study

Highly parallel sparse Cholesky factorization

Author: Gilbert John R.
Schreiber Robert
Publication venue
Publication date
Field of study

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms

NASA Technical Reports Server

Gaussian Belief Propagation Based Multiuser Detection

Author: Bickson Danny
Dolev Danny
Shental Ori
Siegel Paul H.
Wolf Jack K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In this work, we present a novel construction for solving the linear multiuser detection problem using the Gaussian Belief Propagation algorithm. Our algorithm yields an efficient, iterative and distributed implementation of the MMSE detector. We compare our algorithm's performance to a recent result and show an improved memory consumption, reduced computation steps and a reduction in the number of sent messages. We prove that recent work by Montanari et al. is an instance of our general algorithm, providing new convergence results for both algorithms.Comment: 6 pages, 1 figures, appeared in the 2008 IEEE International Symposium on Information Theory, Toronto, July 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Large-Scale Optical Neural Networks based on Photoelectric Multiplication

Author: Bernstein Liane
Englund Dirk
Hamerly Ryan
Sludds Alexander
Soljačić Marin
Publication venue: 'American Physical Society (APS)'
Publication date: 01/05/2019
Field of study

Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large (

N \gtrsim 10^6

) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit- and image-classification reveal a "standard quantum limit" for optical neural networks, set by photodetector shot noise. This bound, which can be as low as 50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully-connected and convolutional networks. We also present a scheme for back-propagation and training that can be performed in the same hardware. This architecture will enable a new class of ultra-low-energy processors for deep learning.Comment: Text: 10 pages, 5 figures, 1 table. Supplementary: 8 pages, 5, figures, 2 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Scalability of broadcast performance in wireless network-on-chip

Author: Abadal Cavallé Sergi
Alarcón Cot Eduardo José
Cabellos Aparicio Alberto
González Colás Antonio María
Lee Heekwan
Mestres Sugrañes Albert
Nemirovsky Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing

Author: Balan Horia Vlad
Caire Giuseppe
Michaloliakos Antonios
Psounis Konstantinos
Rogalin Ryan
Publication venue
Publication date: 14/08/2012
Field of study

The enormous success of advanced wireless devices is pushing the demand for higher wireless data rates. Denser spectrum reuse through the deployment of more access points per square mile has the potential to successfully meet the increasing demand for more bandwidth. In theory, the best approach to density increase is via distributed multiuser MIMO, where several access points are connected to a central server and operate as a large distributed multi-antenna access point, ensuring that all transmitted signal power serves the purpose of data transmission, rather than creating "interference." In practice, while enterprise networks offer a natural setup in which distributed MIMO might be possible, there are serious implementation difficulties, the primary one being the need to eliminate phase and timing offsets between the jointly coordinated access points. In this paper we propose AirSync, a novel scheme which provides not only time but also phase synchronization, thus enabling distributed MIMO with full spatial multiplexing gains. AirSync locks the phase of all access points using a common reference broadcasted over the air in conjunction with a Kalman filter which closely tracks the phase drift. We have implemented AirSync as a digital circuit in the FPGA of the WARP radio platform. Our experimental testbed, comprised of two access points and two clients, shows that AirSync is able to achieve phase synchronization within a few degrees, and allows the system to nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC and higher layer aspects of a practical deployment. To the best of our knowledge, AirSync offers the first ever realization of the full multiuser MIMO gain, namely the ability to increase the number of wireless clients linearly with the number of jointly coordinated access points, without reducing the per client rate.Comment: Submitted to Transactions on Networkin

arXiv.org e-Print Archive

CiteSeerX