941 research outputs found
Generating optimized Fourier interpolation routines for density function theory using SPIRAL
© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%
Total variation on a tree
We consider the problem of minimizing the continuous valued total variation
subject to different unary terms on trees and propose fast direct algorithms
based on dynamic programming to solve these problems. We treat both the convex
and the non-convex case and derive worst case complexities that are equal or
better than existing methods. We show applications to total variation based 2D
image processing and computer vision problems based on a Lagrangian
decomposition approach. The resulting algorithms are very efficient, offer a
high degree of parallelism and come along with memory requirements which are
only in the order of the number of image pixels.Comment: accepted to SIAM Journal on Imaging Sciences (SIIMS
ASIP Design and Prototyping for Wireless Communication Applications
International audienc
Indexed dependence metadata and its applications in software performance optimisation
To achieve continued performance improvements, modern microprocessor design is tending to concentrate
an increasing proportion of hardware on computation units with less automatic management
of data movement and extraction of parallelism. As a result, architectures increasingly include multiple
computation cores and complicated, software-managed memory hierarchies. Compilers have
difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic
generation of efficient code in any but the most straightforward of cases.
We propose the concept of indexed dependence metadata to improve application development and
mapping onto such architectures. The metadata represent both the iteration space of a kernel and the
mapping of that iteration space from a given index to the set of data elements that iteration might
use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping
allows the compiler or runtime to optimise the program more efficiently, and improves the program
structure for the developer. We argue that this form of explicit interface specification reduces the need
for premature, architecture-specific optimisation. It improves program portability, supports intercomponent
optimisation and enables generation of efficient data movement code.
We offer the following contributions: an introduction to the concept of indexed dependence metadata
as a generalisation of stream programming, a demonstration of its advantages in a component
programming system, the decoupled access/execute model for C++ programs, and how indexed dependence
metadata might be used to improve the programming model for GPU-based designs. Our
experimental results with prototype implementations show that indexed dependence metadata supports
automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive
loop fusion optimisations in image processing, linear algebra and multigrid application case
studies
A Kogbetliantz-type algorithm for the hyperbolic SVD
In this paper a two-sided, parallel Kogbetliantz-type algorithm for the
hyperbolic singular value decomposition (HSVD) of real and complex square
matrices is developed, with a single assumption that the input matrix, of order
, admits such a decomposition into the product of a unitary, a non-negative
diagonal, and a -unitary matrix, where is a given diagonal matrix of
positive and negative signs. When , the proposed algorithm computes
the ordinary SVD. The paper's most important contribution -- a derivation of
formulas for the HSVD of matrices -- is presented first, followed
by the details of their implementation in floating-point arithmetic. Next, the
effects of the hyperbolic transformations on the columns of the iteration
matrix are discussed. These effects then guide a redesign of the dynamic pivot
ordering, being already a well-established pivot strategy for the ordinary
Kogbetliantz algorithm, for the general, HSVD. A heuristic but
sound convergence criterion is then proposed, which contributes to high
accuracy demonstrated in the numerical testing results. Such a -Kogbetliantz
algorithm as presented here is intrinsically slow, but is nevertheless usable
for matrices of small orders.Comment: a heavily revised version with 32 pages and 4 figure
Opto-VLSI based WDM multifunction device
The tremendous expansion of telecommunication services in the past decade, in part due to the growth of the Internet, has made the development of high-bandwidth optical net-works a focus of research interest. The implementation of Dense-Wavelength Division Multiplexing (DWDM) optical fiber transmission systems has the potential to meet this demand. However, crucial components of DWDM networks – add/drop multiplexers, filters, gain equalizers as well as interconnects between optical channels – are currently not implemented as dynamically reconfigurable devices. Electronic cross-connects, the traditional solution to the reconfigurable optical networks, are increasingly not feasible due to the rapidly increasing bandwidth of the optical channels. Thus, optically transparent, dynamically reconfigurable DWDM components are important for alleviating the bottleneck in telecommunication systems of the future. In this study, we develop a promising class of Opto-VLSI based devices, including a dynamic multi-function WDM processor, combining the functions of optical filter, channel equalizer and add-drop multiplexer, as well as a reconfigurable optical power splitter. We review the technological options for all optical WDM components and compare their advantages and disadvantages. We develop a model for designing Opto-VLSI based WDM devices, and demonstrate experimentally the Opto-VLSI multi-function WDM device. Finally, we discuss the feasibility of Opto-VLSI WDM components in meeting the stringent requirements of the optical communications industry
Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs
A parallel, blocked, one-sided Hari–Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair (F,G) is here proposed, where F and G have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all non-trivial computation exclusively on the GPUs, requires the minimal amount of memory to be reasonably expected, scales acceptably with the increase of the number of GPUs available, and guarantees the reproducible, bitwise identical output of the runs repeated over the same input and with the same number of GPUs
Towards hybrid molecular simulations
In many biology, chemistry and physics applications molecular simulations can be used to study material and process properties. The level of detail needed in such simulations depends on the application. In some cases quantum mechanical simulations are indispensable. However, traditional ab-initio methods, usually employing plane waves or a linear combination of atomic orbitals as a basis, are extremely expensive in terms of computational as well as memory requirements. The well-known fact that electronic wave functions vary much more rapidly near the atomic nuclei than in inter-atomic regions calls for a multi-resolution approach, allowing one to use low resolution and to add extra resolution only in those regions where necessary, so limiting the costs. This is provided by an alternative basis formed of wavelets. Using such a wavelet basis, a method has been developed for solving electronic structure problems that has been applied successfully to 2D quantum dots and 3D molecular systems. In other cases, it suffices to use effective potentials to describe the atomic interaction instead of the use of the electronic structure, enabling the simulation of larger systems. Molecular dynamics simulations with such effective potentials have been used for a systematic study of surface wettability influence on particle and heat flow in nanochannels, showing that the effects at the solid-gas interface are crucial for the behavior of the whole nanochannel. Again in other cases even coarse grained models can be used where the average behavior of several atoms is combined into a single particle. Such a model, refraining from as much detail as possible while maintaining realistic behavior, has been developed for lipids and with this model the dynamics of membranes and vesicle formation have been studied in detail. A disadvantage of molecular dynamics simulations with effective potentials is that no reactions are possible. Therefore a new method has been developed, where molecular dynamics is coupled with stochastic reactions. Using this method, both unilamellar and multilamellar vesicle formation, and vesicle growth, bursting, and healing are shown. Still larger systems can be simulated using other methods, like the direct simulation Monte Carlo method. However, as shown for nanochannels, these methods are not always accurate enough. But, exploiting again that the finest level of detail is often only needed in part of the domain, a hybrid method has been developed coupling molecular dynamics, where needed for accuracy, and direct simulation Monte Carlo, where possible in order to speed up the calculation. Further development of such hybrid simulations will further increase molecular simulation’s scientific role
- …