103,371 research outputs found
FDD Massive MIMO Based on Efficient Downlink Channel Reconstruction
Massive multiple-input multiple-output (MIMO) systems deploying a large
number of antennas at the base station considerably increase the spectrum
efficiency by serving multiple users simultaneously without causing severe
interference. However, the advantage relies on the availability of the downlink
channel state information (CSI) of multiple users, which is still a challenge
in frequency-division-duplex transmission systems. This paper aims to solve
this problem by developing a full transceiver framework that includes downlink
channel training (or estimation), CSI feedback, and channel reconstruction
schemes. Our framework provides accurate reconstruction results for multiple
users with small amounts of training and feedback overhead. Specifically, we
first develop an enhanced Newtonized orthogonal matching pursuit (eNOMP)
algorithm to extract the frequency-independent parameters (i.e., downtilts,
azimuths, and delays) from the uplink. Then, by leveraging the information from
these frequency-independent parameters, we develop an efficient downlink
training scheme to estimate the downlink channel gains for multiple users. This
training scheme offers an acceptable estimation error rate of the gains with a
limited pilot amount. Numerical results verify the precision of the eNOMP
algorithm and demonstrate that the sum-rate performance of the system using the
reconstructed downlink channel can approach that of the system using perfect
CSI
A Second-Order Distributed Trotter-Suzuki Solver with a Hybrid Kernel
The Trotter-Suzuki approximation leads to an efficient algorithm for solving
the time-dependent Schr\"odinger equation. Using existing highly optimized CPU
and GPU kernels, we developed a distributed version of the algorithm that runs
efficiently on a cluster. Our implementation also improves single node
performance, and is able to use multiple GPUs within a node. The scaling is
close to linear using the CPU kernels, whereas the efficiency of GPU kernels
improve with larger matrices. We also introduce a hybrid kernel that
simultaneously uses multicore CPUs and GPUs in a distributed system. This
kernel is shown to be efficient when the matrix size would not fit in the GPU
memory. Larger quantum systems scale especially well with a high number nodes.
The code is available under an open source license.Comment: 11 pages, 10 figure
Faster Geometric Algorithms via Dynamic Determinant Computation
The computation of determinants or their signs is the core procedure in many
important geometric algorithms, such as convex hull, volume and point location.
As the dimension of the computation space grows, a higher percentage of the
total computation time is consumed by these computations. In this paper we
study the sequences of determinants that appear in geometric algorithms. The
computation of a single determinant is accelerated by using the information
from the previous computations in that sequence.
We propose two dynamic determinant algorithms with quadratic arithmetic
complexity when employed in convex hull and volume computations, and with
linear arithmetic complexity when used in point location problems. We implement
the proposed algorithms and perform an extensive experimental analysis. On one
hand, our analysis serves as a performance study of state-of-the-art
determinant algorithms and implementations. On the other hand, we demonstrate
the supremacy of our methods over state-of-the-art implementations of
determinant and geometric algorithms. Our experimental results include a 20 and
78 times speed-up in volume and point location computations in dimension 6 and
11 respectively.Comment: 29 pages, 8 figures, 3 table
Efficient implementation of the Hardy-Ramanujan-Rademacher formula
We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to
allow the partition function to be computed with softly optimal
complexity and very little overhead. A new implementation
based on these techniques achieves speedups in excess of a factor 500 over
previously published software and has been used by the author to calculate
, an exponent twice as large as in previously reported
computations.
We also investigate performance for multi-evaluation of , where our
implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to
power series methods on far denser sets of indices than previous
implementations. As an application, we determine over 22 billion new
congruences for the partition function, extending Weaver's tabulation of 76,065
congruences.Comment: updated version containing an unconditional complexity proof;
accepted for publication in LMS Journal of Computation and Mathematic
- …