400 research outputs found

    Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

    Get PDF
    Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

    Compression and Conditional Emulation of Climate Model Output

    Full text link
    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus it is important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. The statistical model can be used to generate realizations representing the full dataset, along with characterizations of the uncertainties in the generated data. Thus, the methods are capable of both compression and conditional emulation of the climate models. Considerable attention is paid to accurately modeling the original dataset--one year of daily mean temperature data--particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers

    Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs

    Full text link
    This paper presents a systematic methodology based on the algebraic theory of signal processing to classify and derive fast algorithms for linear transforms. Instead of manipulating the entries of transform matrices, our approach derives the algorithms by stepwise decomposition of the associated signal models, or polynomial algebras. This decomposition is based on two generic methods or algebraic principles that generalize the well-known Cooley-Tukey FFT and make the algorithms' derivations concise and transparent. Application to the 16 discrete cosine and sine transforms yields a large class of fast algorithms, many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar

    A general framework for pricing Asian options under stochastic volatility on parallel architectures

    Get PDF
    In this paper, we present a transform-based algorithm for pricing discretely monitored arithmetic Asian options with remarkable accuracy in a general stochastic volatility framework, including affine models and time-changed Lévy processes. The accuracy is justified both theoretically and experimentally. In addition, to speed up the valuation process, we employ high-performance computing technologies. More specifically, we develop a parallel option pricing system that can be easily reproduced on parallel computers, also realized as a cluster of personal computers. Numerical results showing the accuracy, speed and efficiency of the procedure are reported in the paper

    Novel models and algorithms for systems reliability modeling and optimization

    Get PDF
    Recent growth in the scale and complexity of products and technologies in the defense and other industries is challenging product development, realization, and sustainment costs. Uncontrolled costs and routine budget overruns are causing all parties involved to seek lean product development processes and treatment of reliability, availability, and maintainability of the system as a true design parameter . To this effect, accurate estimation and management of the system reliability of a design during the earliest stages of new product development is not only critical for managing product development and manufacturing costs but also to control life cycle costs (LCC). In this regard, the overall objective of this research study is to develop an integrated framework for design for reliability (DFR) during upfront product development by treating reliability as a design parameter. The aim here is to develop the theory, methods, and tools necessary for: 1) accurate assessment of system reliability and availability and 2) optimization of the design to meet system reliability targets. In modeling the system reliability and availability, we aim to address the limitations of existing methods, in particular the Markov chains method and the Dynamic Bayesian Network approach, by incorporating a Continuous Time Bayesian Network framework for more effective modeling of sub-system/component interactions, dependencies, and various repair policies. We also propose a multi-object optimization scheme to aid the designer in obtaining optimal design(s) with respect to system reliability/availability targets and other system design requirements. In particular, the optimization scheme would entail optimal selection of sub-system and component alternatives. The theory, methods, and tools to be developed will be extensively tested and validated using simulation test-bed data and actual case studies from our industry partners

    A comparative performance analysis of the phase recovery algorithm for microstructure reconstruction

    Get PDF
    This thesis explores the high-performance implementation of a phase recovery algorithm for microstructure reconstruction of materials. Implementations on a variety of high-performance computing platforms, including multi-core and Graphics Processing Unit (GPU), were investigated and compared. The phase recovery algorithm is an iterative process requiring multiple Discrete Fourier Transform (DFT) computations each iteration. In order to achieve high-performance, it is necessary to use highly optimized fast Fourier transform (FFT) code to compute the DFTs. In our investigation, several FFT libraries, including FFTW, the Intel R Math Kernel Library (MKL), the CUFFT library for the NVIDIAR GPU, and the SPIRAL generated code, were used and compared. The SPIRAL system provides an extensible framework for generating and automatically optimizing implementations of DSP (digital signal processing) algorithms described using mathematical formulas, and is the most extensible of the platforms investigated here. The phas recovery algorithm intersperses FFT computations with point-wise computations, and while the FFTs are the dominant computation, the point-wise operations can have a signi cant impact on the overall performance. Therefore, simply relying on the performance of an optimized FFT library is insu cient to obtain optimal performance. Unlike the FFTW, MKL, and CUFFT libraries, the SPIRAL system allows the FFTs to be combined with the point-wise operations and the entire algorithm to be optimized. In this thesis, we obtained a mathematical formula representing the phase recovery algorithm that can be incorporated into the SPIRAL framework and utilize SPIRAL's parallel and vector code generation and optimization facilities. The SPIRAL code generated in this thesis is sequential. We estimate that with a vectorized and parallelized SPIRAL implementation, it is possible to obtain a 1.5-fold speedup for two-dimensional (2D) phase recovery and 1.88-fold speed up for 3D phase recovery over the MKL implementation.M.S., Computer Engineering -- Drexel University, 200

    Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

    Get PDF
    This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
    corecore