55,556 research outputs found
A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices
We present the submatrix method, a highly parallelizable method for the
approximate calculation of inverse p-th roots of large sparse symmetric
matrices which are required in different scientific applications. We follow the
idea of Approximate Computing, allowing imprecision in the final result in
order to be able to utilize the sparsity of the input matrix and to allow
massively parallel execution. For an n x n matrix, the proposed algorithm
allows to distribute the calculations over n nodes with only little
communication overhead. The approximate result matrix exhibits the same
sparsity pattern as the input matrix, allowing for efficient reuse of allocated
data structures.
We evaluate the algorithm with respect to the error that it introduces into
calculated results, as well as its performance and scalability. We demonstrate
that the error is relatively limited for well-conditioned matrices and that
results are still valuable for error-resilient applications like
preconditioning even for ill-conditioned matrices. We discuss the execution
time and scaling of the algorithm on a theoretical level and present a
distributed implementation of the algorithm using MPI and OpenMP. We
demonstrate the scalability of this implementation by running it on a
high-performance compute cluster comprised of 1024 CPU cores, showing a speedup
of 665x compared to single-threaded execution
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing
as large multi-subject datasets are becoming widely available and
high-resolution scanners are adopted. The inherent low-dimensionality of the
information in this data has led neuroscientists to consider factor analysis
methods to extract and analyze the underlying brain activity. In this work, we
consider two recent multi-subject factor analysis methods: the Shared Response
Model and Hierarchical Topographic Factor Analysis. We perform analytical,
algorithmic, and code optimization to enable multi-node parallel
implementations to scale. Single-node improvements result in 99x and 1812x
speedups on these two methods, and enables the processing of larger datasets.
Our distributed implementations show strong scaling of 3.3x and 5.5x
respectively with 20 nodes on real datasets. We also demonstrate weak scaling
on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768
cores
High Speed Railway Wireless Communications: Efficiency v.s. Fairness
High speed railways (HSRs) have been deployed widely all over the world in
recent years. Different from traditional cellular communication, its high
mobility makes it essential to implement power allocation along the time. In
the HSR case, the transmission rate depends greatly on the distance between the
base station (BS) and the train. As a result, the train receives a time varying
data rate service when passing by a BS. It is clear that the most efficient
power allocation will spend all the power when the train is nearest from the
BS, which will cause great unfairness along the time. On the other hand, the
channel inversion allocation achieves the best fairness in terms of constant
rate transmission. However, its power efficiency is much lower. Therefore, the
power efficiency and the fairness along time are two incompatible objects. For
the HSR cellular system considered in this paper, a trade-off between the two
is achieved by proposing a temporal proportional fair power allocation scheme.
Besides, near optimal closed form solution and one algorithm finding the
-optimal allocation are presented.Comment: 16 pages, 6 figure
Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions
Massive MIMO is a compelling wireless access concept that relies on the use
of an excess number of base-station antennas, relative to the number of active
terminals. This technology is a main component of 5G New Radio (NR) and
addresses all important requirements of future wireless standards: a great
capacity increase, the support of many simultaneous users, and improvement in
energy efficiency. Massive MIMO requires the simultaneous processing of signals
from many antenna chains, and computational operations on large matrices. The
complexity of the digital processing has been viewed as a fundamental obstacle
to the feasibility of Massive MIMO in the past. Recent advances on
system-algorithm-hardware co-design have led to extremely energy-efficient
implementations. These exploit opportunities in deeply-scaled silicon
technologies and perform partly distributed processing to cope with the
bottlenecks encountered in the interconnection of many signals. For example,
prototype ASIC implementations have demonstrated zero-forcing precoding in real
time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing
of 8 terminals). Coarse and even error-prone digital processing in the antenna
paths permits a reduction of consumption with a factor of 2 to 5. This article
summarizes the fundamental technical contributions to efficient digital signal
processing for Massive MIMO. The opportunities and constraints on operating on
low-complexity RF and analog hardware chains are clarified. It illustrates how
terminals can benefit from improved energy efficiency. The status of technology
and real-life prototypes discussed. Open challenges and directions for future
research are suggested.Comment: submitted to IEEE transactions on signal processin
Polar Codes over Fading Channels with Power and Delay Constraints
The inherent nature of polar codes being channel specific makes it difficult
to use them in a setting where the communication channel changes with time. In
particular, to be able to use polar codes in a wireless scenario, varying
attenuation due to fading needs to be mitigated. To the best of our knowledge,
there has been no comprehensive work in this direction thus far. In this work,
a practical scheme involving channel inversion with the knowledge of the
channel state at the transmitter, is proposed. An additional practical
constraint on the permissible average and peak power is imposed, which in turn
makes the channel equivalent to an additive white Gaussian noise (AWGN) channel
cascaded with an erasure channel. It is shown that the constructed polar code
could be made to achieve the symmetric capacity of this channel. Further, a
means to compute the optimal design rate of the polar code for a given power
constraint is also discussed.Comment: 6 pages, 6 figure
- …