4,826 research outputs found
An Overview of Multi-Processor Approximate Message Passing
Approximate message passing (AMP) is an algorithmic framework for solving
linear inverse problems from noisy measurements, with exciting applications
such as reconstructing images, audio, hyper spectral images, and various other
signals, including those acquired in compressive signal acquisiton systems. The
growing prevalence of big data systems has increased interest in large-scale
problems, which may involve huge measurement matrices that are unsuitable for
conventional computing systems. To address the challenge of large-scale
processing, multiprocessor (MP) versions of AMP have been developed. We provide
an overview of two such MP-AMP variants. In row-MP-AMP, each computing node
stores a subset of the rows of the matrix and processes corresponding
measurements. In column- MP-AMP, each node stores a subset of columns, and is
solely responsible for reconstructing a portion of the signal. We will discuss
pros and cons of both approaches, summarize recent research results for each,
and explain when each one may be a viable approach. Aspects that are
highlighted include some recent results on state evolution for both MP-AMP
algorithms, and the use of data compression to reduce communication in the MP
network
Multiprocessor Approximate Message Passing with Column-Wise Partitioning
Solving a large-scale regularized linear inverse problem using multiple
processors is important in various real-world applications due to the
limitations of individual processors and constraints on data sharing policies.
This paper focuses on the setting where the matrix is partitioned column-wise.
We extend the algorithmic framework and the theoretical analysis of approximate
message passing (AMP), an iterative algorithm for solving linear inverse
problems, whose asymptotic dynamics are characterized by state evolution (SE).
In particular, we show that column-wise multiprocessor AMP (C-MP-AMP) obeys an
SE under the same assumptions when the SE for AMP holds. The SE results imply
that (i) the SE of C-MP-AMP converges to a state that is no worse than that of
AMP and (ii) the asymptotic dynamics of C-MP-AMP and AMP can be identical.
Moreover, for a setting that is not covered by SE, numerical results show that
damping can improve the convergence performance of C-MP-AMP.Comment: This document contains complete details of the previous version
(i.e., arXiv:1701.02578v1), which was accepted for publication in ICASSP 201
Resource management and application customization for hardware accelerated systems
Computational demands are continuously increasing, driven by the growing resource demands of applications. At the era of big-data, big-scale applications, and real-time applications, there is an enormous need for quick processing of big amounts of data. To meet these demands, computer systems have shifted towards multi-core solutions. Technology scaling has allowed the incorporation of even larger numbers of transistors and cores into chips. Nevertheless, area constrains, power consumption limitations, and thermal dissipation limit the ability to design and sustain ever increasing chips. To overpassthese limitations, system designers have turned towards the usage of hardware accelerators. These accelerators can take the form of modules attached to each core of a multi-core system, forming a network on chip of cores with attached accelerators. Another option of hardware accelerators are Graphics Processing Units (GPUs). GPUs can be connected through a host-device model with a general purpose system, and are used to off-load parts of a workload to them. Additionally, accelerators can be functionality dedicated units. They can be part of a chip and the main processor can offload specific workloads to the hardware accelerator unit.In this dissertation we present: (a) a microcoded synchronization mechanism for systems with hardware accelerators that provide distributed shared memory, (b) a Streaming Multiprocessor (SM) allocation policy for single application execution on GPUs, (c) an SM allocation policy for concurrent applications that execute on GPUs, and (d) a framework to map neural network (NN) weights to approximate multiplier accuracy levels. Theaforementioned mechanisms coexist in the resource management domain. Specifically, the methodologies introduce ways to boost system performance by using hardware accelerators. In tandem with improved performance, the methodologies explore and balance trade-offs that the use of hardware accelerators introduce
On the Achievable Rates of Decentralized Equalization in Massive MU-MIMO Systems
Massive multi-user (MU) multiple-input multiple-output (MIMO) promises
significant gains in spectral efficiency compared to traditional, small-scale
MIMO technology. Linear equalization algorithms, such as zero forcing (ZF) or
minimum mean-square error (MMSE)-based methods, typically rely on centralized
processing at the base station (BS), which results in (i) excessively high
interconnect and chip input/output data rates, and (ii) high computational
complexity. In this paper, we investigate the achievable rates of decentralized
equalization that mitigates both of these issues. We consider two distinct BS
architectures that partition the antenna array into clusters, each associated
with independent radio-frequency chains and signal processing hardware, and the
results of each cluster are fused in a feedforward network. For both
architectures, we consider ZF, MMSE, and a novel, non-linear equalization
algorithm that builds upon approximate message passing (AMP), and we
theoretically analyze the achievable rates of these methods. Our results
demonstrate that decentralized equalization with our AMP-based methods incurs
no or only a negligible loss in terms of achievable rates compared to that of
centralized solutions.Comment: Will be presented at the 2017 IEEE International Symposium on
Information Theor
- …