Search CORE

7,125 research outputs found

Generating optimized Fourier interpolation routines for density function theory using SPIRAL

Author: Franchetti F
Kelly PHJ
Popovici T
Russell FP
Skylaris CK
Wilkinson KA
Publication venue
Publication date: 12/12/2014
Field of study

© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

Spiral - Imperial College Digital Repository

Randomized cache placement for eliminating conflicts

Author: González Colás Antonio María
Topham Nigel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficient parallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of instructions committed per cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Optimal Interleaving on Tori

Author: Bruck Jehoshua
Cook Matthew
Jiang Anxiao (Andrew)
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2004
Field of study

We study t-interleaving on two-dimensional tori, which is defined by the property that any connected subgraph with t or fewer vertices in the torus is labelled by all distinct integers. It has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. We say that a torus can be perfectly t-interleaved if its t-interleaving number – the minimum number of distinct integers needed to t-interleave the torus – meets the spherepacking lower bound. We prove the necessary and sufficient conditions for tori that can be perfectly t-interleaved, and present efficient perfect t-interleaving constructions. The most important contribution of this paper is to prove that the t-interleaving numbers of tori large enough in both dimensions, which constitute by far the majority of all existing cases, is at most one more than the sphere-packing lower bound, and to present an optimal and efficient t-interleaving scheme for them. Then we prove some bounds on the t-interleaving numbers for other cases, completing a general picture for the t-interleaving problem on 2-dimensional tori

CiteSeerX

Caltech Authors

A performance comparison between block interleaved and helically interleaved concatenated coding systems

Author: Cheung K.-M.
Swanson L.
Publication venue
Publication date
Field of study

The performance (bit-error rate vs. signal-to-noise ratio) of two different interleaving systems, block interleaving and the newer helical interleaving are compared. Both systems are studied with and without error forecasting. Without error forecasting, the two systems have identical performance. When error forecasting is used with shallow interleaving, helical interleaving gains, but less than 0.05 dB, over block interleaving. For higher interleaving depth, the systems have almost indistinguishable performance

NASA Technical Reports Server

Gate-Level Simulation of Quantum Circuits

Author: George F. Viamontes
Igor L. Markov
John
John P. Hayes
Manoj Rajagopalan
Publication venue
Publication date: 01/08/2002
Field of study

While thousands of experimental physicists and chemists are currently trying to build scalable quantum computers, it appears that simulation of quantum computation will be at least as critical as circuit simulation in classical VLSI design. However, since the work of Richard Feynman in the early 1980s little progress was made in practical quantum simulation. Most researchers focused on polynomial-time simulation of restricted types of quantum circuits that fall short of the full power of quantum computation. Simulating quantum computing devices and useful quantum algorithms on classical hardware now requires excessive computational resources, making many important simulation tasks infeasible. In this work we propose a new technique for gate-level simulation of quantum circuits which greatly reduces the difficulty and cost of such simulations. The proposed technique is implemented in a simulation tool called the Quantum Information Decision Diagram (QuIDD) and evaluated by simulating Grover's quantum search algorithm. The back-end of our package, QuIDD Pro, is based on Binary Decision Diagrams, well-known for their ability to efficiently represent many seemingly intractable combinatorial structures. This reliance on a well-established area of research allows us to take advantage of existing software for BDD manipulation and achieve unparalleled empirical results for quantum simulation

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Which Way Was I Going? Contextual Retrieval Supports the Disambiguation of Well Learned Overlapping Navigational Routes

Author: Brown Thackery I
Hasselmo Michael E.
Keller Joseph B.
Ross Robert S.
Stern Chantal E.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 26/05/2010
Field of study

Groundbreaking research in animals has demonstrated that the hippocampus contains neurons that distinguish betweenoverlapping navigational trajectories. These hippocampal neurons respond selectively to the context of specific episodes despite interference from overlapping memory representations. The present study used functional magnetic resonanceimaging in humans to examine the role of the hippocampus and related structures when participants need to retrievecontextual information to navigate well learned spatial sequences that share common elements. Participants were trained outside the scanner to navigate through 12 virtual mazes from a ground-level first-person perspective. Six of the 12 mazes shared overlapping components. Overlapping mazes began and ended at distinct locations, but converged in the middle to share some hallways with another maze. Non-overlapping mazes did not share any hallways with any other maze. Successful navigation through the overlapping hallways required the retrieval of contextual information relevant to thecurrent navigational episode. Results revealed greater activation during the successful navigation of the overlapping mazes compared with the non-overlapping mazes in regions typically associated with spatial and episodic memory, including thehippocampus, parahippocampal cortex, and orbitofrontal cortex. When combined with previous research, the current findings suggest that an anatomically integrated system including the hippocampus, parahippocampal cortex, and orbitofrontal cortexis critical for the contextually dependent retrieval of well learned overlapping navigational routes

PubMed Central

UNH Scholars' Repository