Search CORE

941 research outputs found

Generating optimized Fourier interpolation routines for density function theory using SPIRAL

Author: Franchetti F
Kelly PHJ
Popovici T
Russell FP
Skylaris CK
Wilkinson KA
Publication venue
Publication date: 12/12/2014
Field of study

© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

Spiral - Imperial College Digital Repository

Total variation on a tree

Author: Kolmogorov Vladimir
Pock Thomas
Rolinek Michal
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better than existing methods. We show applications to total variation based 2D image processing and computer vision problems based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, offer a high degree of parallelism and come along with memory requirements which are only in the order of the number of image pixels.Comment: accepted to SIAM Journal on Imaging Sciences (SIIMS

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

ASIP Design and Prototyping for Wireless Communication Applications

Author: Amer Baghdadi
Atif Raza Jafri
Michel Jezequel
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

International audienc

IntechOpen

HAL-Université de Bretagne Occidentale

HAL Descartes

Indexed dependence metadata and its applications in software performance optimisation

Author: Howes Lee William
Howes Lee William
Publication venue: Computing, Imperial College London
Publication date: 01/04/2010
Field of study

To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

Spiral - Imperial College Digital Repository

A Kogbetliantz-type algorithm for the hyperbolic SVD

Author: Novaković Vedran
Singer Sanja
Publication venue
Publication date: 05/12/2020
Field of study

In this paper a two-sided, parallel Kogbetliantz-type algorithm for the hyperbolic singular value decomposition (HSVD) of real and complex square matrices is developed, with a single assumption that the input matrix, of order

n

, admits such a decomposition into the product of a unitary, a non-negative diagonal, and a

J

-unitary matrix, where

J

is a given diagonal matrix of positive and negative signs. When

J=\pm I

, the proposed algorithm computes the ordinary SVD. The paper's most important contribution -- a derivation of formulas for the HSVD of

2\times 2

matrices -- is presented first, followed by the details of their implementation in floating-point arithmetic. Next, the effects of the hyperbolic transformations on the columns of the iteration matrix are discussed. These effects then guide a redesign of the dynamic pivot ordering, being already a well-established pivot strategy for the ordinary Kogbetliantz algorithm, for the general,

n\times n

HSVD. A heuristic but sound convergence criterion is then proposed, which contributes to high accuracy demonstrated in the numerical testing results. Such a

J

-Kogbetliantz algorithm as presented here is intrinsically slow, but is nevertheless usable for matrices of small orders.Comment: a heavily revised version with 32 pages and 4 figure

arXiv.org e-Print Archive

Opto-VLSI based WDM multifunction device

Author: Ahderom Selam T.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2004
Field of study

The tremendous expansion of telecommunication services in the past decade, in part due to the growth of the Internet, has made the development of high-bandwidth optical net-works a focus of research interest. The implementation of Dense-Wavelength Division Multiplexing (DWDM) optical fiber transmission systems has the potential to meet this demand. However, crucial components of DWDM networks – add/drop multiplexers, filters, gain equalizers as well as interconnects between optical channels – are currently not implemented as dynamically reconfigurable devices. Electronic cross-connects, the traditional solution to the reconfigurable optical networks, are increasingly not feasible due to the rapidly increasing bandwidth of the optical channels. Thus, optically transparent, dynamically reconfigurable DWDM components are important for alleviating the bottleneck in telecommunication systems of the future. In this study, we develop a promising class of Opto-VLSI based devices, including a dynamic multi-function WDM processor, combining the functions of optical filter, channel equalizer and add-drop multiplexer, as well as a reconfigurable optical power splitter. We review the technological options for all optical WDM components and compare their advantages and disadvantages. We develop a model for designing Opto-VLSI based WDM devices, and demonstrate experimentally the Opto-VLSI multi-function WDM device. Finally, we discuss the feasibility of Opto-VLSI WDM components in meeting the stringent requirements of the optical communications industry

Research Online @ ECU

Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs

Author: Novaković Vedran
Singer Sanja
Publication venue: 'SAGE Publications'
Publication date: 13/10/2020
Field of study

A parallel, blocked, one-sided Hari–Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair (F,G) is here proposed, where F and G have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all non-trivial computation exclusively on the GPUs, requires the minimal amount of memory to be reasonably expected, scales acceptably with the increase of the number of GPUs available, and guarantees the reproducible, bitwise identical output of the runs repeated over the same input and with the same number of GPUs

arXiv.org e-Print Archive

Repositori Institucional de la Universitat Jaume I

Towards hybrid molecular simulations

Author: Markvoort Albert. J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2006
Field of study

In many biology, chemistry and physics applications molecular simulations can be used to study material and process properties. The level of detail needed in such simulations depends on the application. In some cases quantum mechanical simulations are indispensable. However, traditional ab-initio methods, usually employing plane waves or a linear combination of atomic orbitals as a basis, are extremely expensive in terms of computational as well as memory requirements. The well-known fact that electronic wave functions vary much more rapidly near the atomic nuclei than in inter-atomic regions calls for a multi-resolution approach, allowing one to use low resolution and to add extra resolution only in those regions where necessary, so limiting the costs. This is provided by an alternative basis formed of wavelets. Using such a wavelet basis, a method has been developed for solving electronic structure problems that has been applied successfully to 2D quantum dots and 3D molecular systems. In other cases, it suffices to use effective potentials to describe the atomic interaction instead of the use of the electronic structure, enabling the simulation of larger systems. Molecular dynamics simulations with such effective potentials have been used for a systematic study of surface wettability influence on particle and heat flow in nanochannels, showing that the effects at the solid-gas interface are crucial for the behavior of the whole nanochannel. Again in other cases even coarse grained models can be used where the average behavior of several atoms is combined into a single particle. Such a model, refraining from as much detail as possible while maintaining realistic behavior, has been developed for lipids and with this model the dynamics of membranes and vesicle formation have been studied in detail. A disadvantage of molecular dynamics simulations with effective potentials is that no reactions are possible. Therefore a new method has been developed, where molecular dynamics is coupled with stochastic reactions. Using this method, both unilamellar and multilamellar vesicle formation, and vesicle growth, bursting, and healing are shown. Still larger systems can be simulated using other methods, like the direct simulation Monte Carlo method. However, as shown for nanochannels, these methods are not always accurate enough. But, exploiting again that the finest level of detail is often only needed in part of the domain, a hybrid method has been developed coupling molecular dynamics, where needed for accuracy, and direct simulation Monte Carlo, where possible in order to speed up the calculation. Further development of such hybrid simulations will further increase molecular simulation’s scientific role

Repository TU/e

Pure OAI Repository