941 research outputs found

    Generating optimized Fourier interpolation routines for density function theory using SPIRAL

    Get PDF
    © 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

    Total variation on a tree

    Full text link
    We consider the problem of minimizing the continuous valued total variation subject to different unary terms on trees and propose fast direct algorithms based on dynamic programming to solve these problems. We treat both the convex and the non-convex case and derive worst case complexities that are equal or better than existing methods. We show applications to total variation based 2D image processing and computer vision problems based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, offer a high degree of parallelism and come along with memory requirements which are only in the order of the number of image pixels.Comment: accepted to SIAM Journal on Imaging Sciences (SIIMS

    Indexed dependence metadata and its applications in software performance optimisation

    No full text
    To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

    A Kogbetliantz-type algorithm for the hyperbolic SVD

    Full text link
    In this paper a two-sided, parallel Kogbetliantz-type algorithm for the hyperbolic singular value decomposition (HSVD) of real and complex square matrices is developed, with a single assumption that the input matrix, of order nn, admits such a decomposition into the product of a unitary, a non-negative diagonal, and a JJ-unitary matrix, where JJ is a given diagonal matrix of positive and negative signs. When J=±IJ=\pm I, the proposed algorithm computes the ordinary SVD. The paper's most important contribution -- a derivation of formulas for the HSVD of 2×22\times 2 matrices -- is presented first, followed by the details of their implementation in floating-point arithmetic. Next, the effects of the hyperbolic transformations on the columns of the iteration matrix are discussed. These effects then guide a redesign of the dynamic pivot ordering, being already a well-established pivot strategy for the ordinary Kogbetliantz algorithm, for the general, n×nn\times n HSVD. A heuristic but sound convergence criterion is then proposed, which contributes to high accuracy demonstrated in the numerical testing results. Such a JJ-Kogbetliantz algorithm as presented here is intrinsically slow, but is nevertheless usable for matrices of small orders.Comment: a heavily revised version with 32 pages and 4 figure

    Opto-VLSI based WDM multifunction device

    Get PDF
    The tremendous expansion of telecommunication services in the past decade, in part due to the growth of the Internet, has made the development of high-bandwidth optical net-works a focus of research interest. The implementation of Dense-Wavelength Division Multiplexing (DWDM) optical fiber transmission systems has the potential to meet this demand. However, crucial components of DWDM networks – add/drop multiplexers, filters, gain equalizers as well as interconnects between optical channels – are currently not implemented as dynamically reconfigurable devices. Electronic cross-connects, the traditional solution to the reconfigurable optical networks, are increasingly not feasible due to the rapidly increasing bandwidth of the optical channels. Thus, optically transparent, dynamically reconfigurable DWDM components are important for alleviating the bottleneck in telecommunication systems of the future. In this study, we develop a promising class of Opto-VLSI based devices, including a dynamic multi-function WDM processor, combining the functions of optical filter, channel equalizer and add-drop multiplexer, as well as a reconfigurable optical power splitter. We review the technological options for all optical WDM components and compare their advantages and disadvantages. We develop a model for designing Opto-VLSI based WDM devices, and demonstrate experimentally the Opto-VLSI multi-function WDM device. Finally, we discuss the feasibility of Opto-VLSI WDM components in meeting the stringent requirements of the optical communications industry

    Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs

    Get PDF
    A parallel, blocked, one-sided Hari–Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair (F,G) is here proposed, where F and G have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all non-trivial computation exclusively on the GPUs, requires the minimal amount of memory to be reasonably expected, scales acceptably with the increase of the number of GPUs available, and guarantees the reproducible, bitwise identical output of the runs repeated over the same input and with the same number of GPUs

    Towards hybrid molecular simulations

    Get PDF
    In many biology, chemistry and physics applications molecular simulations can be used to study material and process properties. The level of detail needed in such simulations depends on the application. In some cases quantum mechanical simulations are indispensable. However, traditional ab-initio methods, usually employing plane waves or a linear combination of atomic orbitals as a basis, are extremely expensive in terms of computational as well as memory requirements. The well-known fact that electronic wave functions vary much more rapidly near the atomic nuclei than in inter-atomic regions calls for a multi-resolution approach, allowing one to use low resolution and to add extra resolution only in those regions where necessary, so limiting the costs. This is provided by an alternative basis formed of wavelets. Using such a wavelet basis, a method has been developed for solving electronic structure problems that has been applied successfully to 2D quantum dots and 3D molecular systems. In other cases, it suffices to use effective potentials to describe the atomic interaction instead of the use of the electronic structure, enabling the simulation of larger systems. Molecular dynamics simulations with such effective potentials have been used for a systematic study of surface wettability influence on particle and heat flow in nanochannels, showing that the effects at the solid-gas interface are crucial for the behavior of the whole nanochannel. Again in other cases even coarse grained models can be used where the average behavior of several atoms is combined into a single particle. Such a model, refraining from as much detail as possible while maintaining realistic behavior, has been developed for lipids and with this model the dynamics of membranes and vesicle formation have been studied in detail. A disadvantage of molecular dynamics simulations with effective potentials is that no reactions are possible. Therefore a new method has been developed, where molecular dynamics is coupled with stochastic reactions. Using this method, both unilamellar and multilamellar vesicle formation, and vesicle growth, bursting, and healing are shown. Still larger systems can be simulated using other methods, like the direct simulation Monte Carlo method. However, as shown for nanochannels, these methods are not always accurate enough. But, exploiting again that the finest level of detail is often only needed in part of the domain, a hybrid method has been developed coupling molecular dynamics, where needed for accuracy, and direct simulation Monte Carlo, where possible in order to speed up the calculation. Further development of such hybrid simulations will further increase molecular simulation’s scientific role
    • …
    corecore