88,862 research outputs found
Full Diversity Unitary Precoded Integer-Forcing
We consider a point-to-point flat-fading MIMO channel with channel state
information known both at transmitter and receiver. At the transmitter side, a
lattice coding scheme is employed at each antenna to map information symbols to
independent lattice codewords drawn from the same codebook. Each lattice
codeword is then multiplied by a unitary precoding matrix and sent
through the channel. At the receiver side, an integer-forcing (IF) linear
receiver is employed. We denote this scheme as unitary precoded integer-forcing
(UPIF). We show that UPIF can achieve full-diversity under a constraint based
on the shortest vector of a lattice generated by the precoding matrix . This constraint and a simpler version of that provide design criteria for
two types of full-diversity UPIF. Type I uses a unitary precoder that adapts at
each channel realization. Type II uses a unitary precoder, which remains fixed
for all channel realizations. We then verify our results by computer
simulations in , and MIMO using different QAM
constellations. We finally show that the proposed Type II UPIF outperform the
MIMO precoding X-codes at high data rates.Comment: 12 pages, 8 figures, to appear in IEEE-TW
A Blocked Linear Method for Optimizing Large Parameter Sets in Variational Monte Carlo
We present a modification to variational Monte Carlo's linear method
optimization scheme that addresses a critical memory bottleneck while
maintaining compatibility with both the traditional ground state variational
principle and our recently-introduced variational principle for excited states.
For wave function ansatzes with tens of thousands of variables, our
modification reduces the required memory per parallel process from tens of
gigabytes to hundreds of megabytes, making the methodology a much better fit
for modern supercomputer architectures in which data communication and
per-process memory consumption are primary concerns. We verify the efficacy of
the new optimization scheme in small molecule tests involving both the Hilbert
space Jastrow antisymmetric geminal power ansatz and real space multi-Slater
Jastrow expansions. Satisfied with its performance, we have added the optimizer
to the QMCPACK software package, with which we demonstrate on a hydrogen ring a
prototype approach for making systematically convergent, non-perturbative
predictions of Mott-insulators' optical band gaps.Comment: 9 pages, 3 tables, 4 figure
Efficient computation of partition of unity interpolants through a block-based searching technique
In this paper we propose a new efficient interpolation tool, extremely
suitable for large scattered data sets. The partition of unity method is used
and performed by blending Radial Basis Functions (RBFs) as local approximants
and using locally supported weight functions. In particular we present a new
space-partitioning data structure based on a partition of the underlying
generic domain in blocks. This approach allows us to examine only a reduced
number of blocks in the search process of the nearest neighbour points, leading
to an optimized searching routine. Complexity analysis and numerical
experiments in two- and three-dimensional interpolation support our findings.
Some applications to geometric modelling are also considered. Moreover, the
associated software package written in \textsc{Matlab} is here discussed and
made available to the scientific community
Fast -NNG construction with GPU-based quick multi-select
In this paper we describe a new brute force algorithm for building the
-Nearest Neighbor Graph (-NNG). The -NNG algorithm has many
applications in areas such as machine learning, bio-informatics, and clustering
analysis. While there are very efficient algorithms for data of low dimensions,
for high dimensional data the brute force search is the best algorithm. There
are two main parts to the algorithm: the first part is finding the distances
between the input vectors which may be formulated as a matrix multiplication
problem. The second is the selection of the -NNs for each of the query
vectors. For the second part, we describe a novel graphics processing unit
(GPU) -based multi-select algorithm based on quick sort. Our optimization makes
clever use of warp voting functions available on the latest GPUs along with
use-controlled cache. Benchmarks show significant improvement over
state-of-the-art implementations of the -NN search on GPUs
A Similarity Measure for GPU Kernel Subgraph Matching
Accelerator architectures specialize in executing SIMD (single instruction,
multiple data) in lockstep. Because the majority of CUDA applications are
parallelized loops, control flow information can provide an in-depth
characterization of a kernel. CUDAflow is a tool that statically separates CUDA
binaries into basic block regions and dynamically measures instruction and
basic block frequencies. CUDAflow captures this information in a control flow
graph (CFG) and performs subgraph matching across various kernel's CFGs to gain
insights to an application's resource requirements, based on the shape and
traversal of the graph, instruction operations executed and registers
allocated, among other information. The utility of CUDAflow is demonstrated
with SHOC and Rodinia application case studies on a variety of GPU
architectures, revealing novel thread divergence characteristics that
facilitates end users, autotuners and compilers in generating high performing
code
- …