Search CORE

88,862 research outputs found

Full Diversity Unitary Precoded Integer-Forcing

Author: Sakzad Amin
Viterbo Emanuele
Publication venue
Publication date: 31/03/2015
Field of study

We consider a point-to-point flat-fading MIMO channel with channel state information known both at transmitter and receiver. At the transmitter side, a lattice coding scheme is employed at each antenna to map information symbols to independent lattice codewords drawn from the same codebook. Each lattice codeword is then multiplied by a unitary precoding matrix

{\bf P}

and sent through the channel. At the receiver side, an integer-forcing (IF) linear receiver is employed. We denote this scheme as unitary precoded integer-forcing (UPIF). We show that UPIF can achieve full-diversity under a constraint based on the shortest vector of a lattice generated by the precoding matrix

{\bf P}

. This constraint and a simpler version of that provide design criteria for two types of full-diversity UPIF. Type I uses a unitary precoder that adapts at each channel realization. Type II uses a unitary precoder, which remains fixed for all channel realizations. We then verify our results by computer simulations in

2\times2

, and

4\times 4

MIMO using different QAM constellations. We finally show that the proposed Type II UPIF outperform the MIMO precoding X-codes at high data rates.Comment: 12 pages, 8 figures, to appear in IEEE-TW

arXiv.org e-Print Archive

CiteSeerX

A Blocked Linear Method for Optimizing Large Parameter Sets in Variational Monte Carlo

Author: Neuscamman Eric
Zhao Luning
Publication venue
Publication date: 05/02/2017
Field of study

We present a modification to variational Monte Carlo's linear method optimization scheme that addresses a critical memory bottleneck while maintaining compatibility with both the traditional ground state variational principle and our recently-introduced variational principle for excited states. For wave function ansatzes with tens of thousands of variables, our modification reduces the required memory per parallel process from tens of gigabytes to hundreds of megabytes, making the methodology a much better fit for modern supercomputer architectures in which data communication and per-process memory consumption are primary concerns. We verify the efficacy of the new optimization scheme in small molecule tests involving both the Hilbert space Jastrow antisymmetric geminal power ansatz and real space multi-Slater Jastrow expansions. Satisfied with its performance, we have added the optimizer to the QMCPACK software package, with which we demonstrate on a hydrogen ring a prototype approach for making systematically convergent, non-perturbative predictions of Mott-insulators' optical band gaps.Comment: 9 pages, 3 tables, 4 figure

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Efficient computation of partition of unity interpolants through a block-based searching technique

Author: Cavoretto R.
De Rossi A.
Perracchione E.
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we propose a new efficient interpolation tool, extremely suitable for large scattered data sets. The partition of unity method is used and performed by blending Radial Basis Functions (RBFs) as local approximants and using locally supported weight functions. In particular we present a new space-partitioning data structure based on a partition of the underlying generic domain in blocks. This approach allows us to examine only a reduced number of blocks in the search process of the nearest neighbour points, leading to an optimized searching routine. Complexity analysis and numerical experiments in two- and three-dimensional interpolation support our findings. Some applications to geometric modelling are also considered. Moreover, the associated software package written in \textsc{Matlab} is here discussed and made available to the scientific community

arXiv.org e-Print Archive

Institutional Research Information System University of Turin

Fast $k$ -NNG construction with GPU-based quick multi-select

Author: D'Souza Roshan
Dashti Ali
Komarov Ivan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/09/2013
Field of study

In this paper we describe a new brute force algorithm for building the

k

-Nearest Neighbor Graph (

k

-NNG). The

k

-NNG algorithm has many applications in areas such as machine learning, bio-informatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best algorithm. There are two main parts to the algorithm: the first part is finding the distances between the input vectors which may be formulated as a matrix multiplication problem. The second is the selection of the

k

-NNs for each of the query vectors. For the second part, we describe a novel graphics processing unit (GPU) -based multi-select algorithm based on quick sort. Our optimization makes clever use of warp voting functions available on the latest GPUs along with use-controlled cache. Benchmarks show significant improvement over state-of-the-art implementations of the

k

-NN search on GPUs

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

A Similarity Measure for GPU Kernel Subgraph Matching

Author: A Sabne
BP Miller
C Böhm
F Zhang
G Ammons
L Adhianto
MH Williams
R Lim
R Singh
RC Gonzales
SS Shende
T Ball
Publication venue
Publication date: 21/03/2019
Field of study

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel's CFGs to gain insights to an application's resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel thread divergence characteristics that facilitates end users, autotuners and compilers in generating high performing code

arXiv.org e-Print Archive

Crossref