Search CORE

9 research outputs found

Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs

Author: Moura Jose M. F.
Pueschel Markus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/02/2007
Field of study

This paper presents a systematic methodology based on the algebraic theory of signal processing to classify and derive fast algorithms for linear transforms. Instead of manipulating the entries of transform matrices, our approach derives the algorithms by stepwise decomposition of the associated signal models, or polynomial algebras. This decomposition is based on two generic methods or algebraic principles that generalize the well-known Cooley-Tukey FFT and make the algorithms' derivations concise and transparent. Application to the 16 discrete cosine and sine transforms yields a large class of fast algorithms, many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar

arXiv.org e-Print Archive

CiteSeerX

A comparative performance analysis of the phase recovery algorithm for microstructure reconstruction

Author: Kurpad Anupama Shankar
Publication venue: Drexel University
Publication date: 01/06/2009
Field of study

This thesis explores the high-performance implementation of a phase recovery algorithm for microstructure reconstruction of materials. Implementations on a variety of high-performance computing platforms, including multi-core and Graphics Processing Unit (GPU), were investigated and compared. The phase recovery algorithm is an iterative process requiring multiple Discrete Fourier Transform (DFT) computations each iteration. In order to achieve high-performance, it is necessary to use highly optimized fast Fourier transform (FFT) code to compute the DFTs. In our investigation, several FFT libraries, including FFTW, the Intel R Math Kernel Library (MKL), the CUFFT library for the NVIDIAR GPU, and the SPIRAL generated code, were used and compared. The SPIRAL system provides an extensible framework for generating and automatically optimizing implementations of DSP (digital signal processing) algorithms described using mathematical formulas, and is the most extensible of the platforms investigated here. The phas recovery algorithm intersperses FFT computations with point-wise computations, and while the FFTs are the dominant computation, the point-wise operations can have a signi cant impact on the overall performance. Therefore, simply relying on the performance of an optimized FFT library is insu cient to obtain optimal performance. Unlike the FFTW, MKL, and CUFFT libraries, the SPIRAL system allows the FFTs to be combined with the point-wise operations and the entire algorithm to be optimized. In this thesis, we obtained a mathematical formula representing the phase recovery algorithm that can be incorporated into the SPIRAL framework and utilize SPIRAL's parallel and vector code generation and optimization facilities. The SPIRAL code generated in this thesis is sequential. We estimate that with a vectorized and parallelized SPIRAL implementation, it is possible to obtain a 1.5-fold speedup for two-dimensional (2D) phase recovery and 1.88-fold speed up for 3D phase recovery over the MKL implementation.M.S., Computer Engineering -- Drexel University, 200

Drexel Libraries E-Repository and Archives

How to Architect a Query Compiler

Author: Brown Lewis
Dashti Rahmat Abadi Mohammad
Klonatos Ioannis
Koch Christoph
Parreaux Lionel Emile Vincent
Shaikhha Amir
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/04/2016
Field of study

This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Efficient Generation of Sequences of Dense Linear Algebra through Auto-Tuning

Author: Belter Geoffrey
Publication venue: CU Scholar
Publication date: 01/01/2012
Field of study

It is rare for a programmer to solve a numerical problem with a single library call; most problems require a sequence of calls. In the case of linear algebra, programmers will chain a series of Basic Linear Algebra Subprogram (BLAS) library calls to achieve the desired result. When a sequence of BLAS calls is memory bound, a great deal of performance is missed because optimization has not occurred between library routines. It is not practical to create a library with every required sequence of linear algebra operations, but at the same time it is difficult for programmers to write their own high performance implementation. One solution is for programmers to use an auto-tuning tool capable of optimizing the sequence of operations that exactly suits their need. This thesis presents a matrix representation and type system that describes basic linear algebra operations, the loops required to implement those operations, and the legality of key optimizations. This is demonstrated in an auto-tuning tool which generates loops and performs data parallelism and loop fusion. Results show that this approach can match or exceed performance of vendor tuned BLAS libraries, general purpose optimizing compilers, and hand written code. Further, this approach is shown to be both portable and work with a range of dense matrix storage formats. All of this is achieved with search times in the range of several minutes to a few hours

CU Scholar Institutional Repository

Formal Loop Merging for Signal Transforms

Author: Franz Franchetti
Markus Püschel
Yevgen Voronenko
Publication venue
Publication date: 01/01/2005
Field of study

A critical optimization in the domain of linear signal transforms, such as the discrete Fourier transform (DFT), is loop merging, which increases data locality and reuse and thus performance. In particular, this includes the conversion of shuffle operations into array reindexings. To date, loop merging is well understood only for the DFT, and only for Cooley-Tukey FFT based algorithms, which excludes DFT sizes divisible by large primes. In this paper, we present a formal loop merging framework for general signal transforms and its implementation within the SPIRAL code generator. The framework consists of Σ-SPL, a mathematical language to express loops and index mappings; a rewriting system to merge loops in Σ-SPL; and a compiler that translates Σ-SPL into code. We apply the framework to DFT sizes that cannot be handled using only the Cooley-Tukey FFT and compare our method to FFTW 3.0.1 and the vendor library Intel MKL 7.2.1. Compared to FFTW our generated code is a factor of 2–4 faster under equal implementation conditions (same algorithms, same unrolling threshold). For some sizes we show a speed-up of a factor of 9 using Bluestein’s algorithm. Further, we give a detailed comparison against the Intel vendor library MKL; our generated code is between 2 times faster and 4.5 times slower

CiteSeerX

Crossref

Compilation and Code Optimization for Data Analytics

Author: Abdullah Abdul Rahim
Kasim Rizanaliah
Mohamad Basir Muhammad Sufyan Safwan
Ramli Mohd Zulkifli
Selamat Nur Asmiza
Publication venue: Lausanne, EPFL
Publication date: 01/03/2016
Field of study

The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

Infoscience - École polytechnique fédérale de Lausanne

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Compilation and Code Optimization for Data Analytics

Author: Shaikhha Amir
Publication venue: Lausanne, EPFL
Publication date: 29/08/2018
Field of study

Infoscience - École polytechnique fédérale de Lausanne