400 research outputs found
Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP
Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation
Compression and Conditional Emulation of Climate Model Output
Numerical climate model simulations run at high spatial and temporal
resolutions generate massive quantities of data. As our computing capabilities
continue to increase, storing all of the data is not sustainable, and thus it
is important to develop methods for representing the full datasets by smaller
compressed versions. We propose a statistical compression and decompression
algorithm based on storing a set of summary statistics as well as a statistical
model describing the conditional distribution of the full dataset given the
summary statistics. The statistical model can be used to generate realizations
representing the full dataset, along with characterizations of the
uncertainties in the generated data. Thus, the methods are capable of both
compression and conditional emulation of the climate models. Considerable
attention is paid to accurately modeling the original dataset--one year of
daily mean temperature data--particularly with regard to the inherent spatial
nonstationarity in global fields, and to determining the statistics to be
stored, so that the variation in the original data can be closely captured,
while allowing for fast decompression and conditional emulation on modest
computers
Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs
This paper presents a systematic methodology based on the algebraic theory of
signal processing to classify and derive fast algorithms for linear transforms.
Instead of manipulating the entries of transform matrices, our approach derives
the algorithms by stepwise decomposition of the associated signal models, or
polynomial algebras. This decomposition is based on two generic methods or
algebraic principles that generalize the well-known Cooley-Tukey FFT and make
the algorithms' derivations concise and transparent. Application to the 16
discrete cosine and sine transforms yields a large class of fast algorithms,
many of which have not been found before.Comment: 31 pages, more information at http://www.ece.cmu.edu/~smar
A general framework for pricing Asian options under stochastic volatility on parallel architectures
In this paper, we present a transform-based algorithm for pricing discretely monitored arithmetic Asian options with remarkable accuracy in a general stochastic volatility framework, including affine models and time-changed Lévy processes. The accuracy is justified both theoretically and experimentally. In addition, to speed up the valuation process, we employ high-performance computing technologies. More specifically, we develop a parallel option pricing system that can be easily reproduced on parallel computers, also realized as a cluster of personal computers. Numerical results showing the accuracy, speed and efficiency of the procedure are reported in the paper
Novel models and algorithms for systems reliability modeling and optimization
Recent growth in the scale and complexity of products and technologies in the defense and other industries is challenging product development, realization, and sustainment costs. Uncontrolled costs and routine budget overruns are causing all parties involved to seek lean product development processes and treatment of reliability, availability, and maintainability of the system as a true design parameter . To this effect, accurate estimation and management of the system reliability of a design during the earliest stages of new product development is not only critical for managing product development and manufacturing costs but also to control life cycle costs (LCC). In this regard, the overall objective of this research study is to develop an integrated framework for design for reliability (DFR) during upfront product development by treating reliability as a design parameter. The aim here is to develop the theory, methods, and tools necessary for: 1) accurate assessment of system reliability and availability and 2) optimization of the design to meet system reliability targets. In modeling the system reliability and availability, we aim to address the limitations of existing methods, in particular the Markov chains method and the Dynamic Bayesian Network approach, by incorporating a Continuous Time Bayesian Network framework for more effective modeling of sub-system/component interactions, dependencies, and various repair policies. We also propose a multi-object optimization scheme to aid the designer in obtaining optimal design(s) with respect to system reliability/availability targets and other system design requirements. In particular, the optimization scheme would entail optimal selection of sub-system and component alternatives. The theory, methods, and tools to be developed will be extensively tested and validated using simulation test-bed data and actual case studies from our industry partners
Recommended from our members
Dynamic Fault Tree Analysis: State-of-the-Art in Modeling, Analysis, and Tools
YesSafety and reliability are two important aspects of dependability that are needed to be rigorously evaluated throughout the development life-cycle of a system. Over the years, several methodologies have been developed for the analysis of failure behavior of systems. Fault tree analysis (FTA) is one of the well-established and widely used methods for safety and reliability engineering of systems. Fault tree, in its classical static form, is inadequate for modeling dynamic interactions between components and is unable to include temporal and statistical dependencies in the model. Several attempts have been made to alleviate the aforementioned limitations of static fault trees (SFT). Dynamic fault trees (DFT) were introduced to enhance the modeling power of its static counterpart. In DFT, the expressiveness of fault tree was improved by introducing new dynamic gates. While the introduction of the dynamic gates helps to overcome many limitations of SFT and allows to analyze a wide range of complex systems, it brings some overhead with it. One such overhead is that the existing combinatorial approaches used for qualitative and quantitative analysis of SFTs are no longer applicable to DFTs. This leads to several successful attempts for developing new approaches for DFT analysis. The methodologies used so far for DFT analysis include, but not limited to, algebraic solution, Markov models, Petri Nets, Bayesian Networks, and Monte Carlo simulation. To illustrate the usefulness of modeling capability of DFTs, many benchmark studies have been performed in different industries. Moreover, software tools are developed to aid in the DFT analysis process. Firstly, in this chapter, we provided a brief description of the DFT methodology. Secondly, this chapter reviews a number of prominent DFT analysis techniques such as Markov chains, Petri Nets, Bayesian networks, algebraic approach; and provides insight into their working mechanism, applicability, strengths, and challenges. These reviewed techniques covered both qualitative and quantitative analysis of DFTs. Thirdly, we discussed the emerging trends in machine learning based approaches to DFT analysis. Fourthly, the research performed for sensitivity analysis in DFTs has been reviewed. Finally, we provided some potential future research directions for DFT-based safety and reliability analysis
A comparative performance analysis of the phase recovery algorithm for microstructure reconstruction
This thesis explores the high-performance implementation of a phase recovery algorithm for microstructure reconstruction of materials. Implementations on a variety of high-performance computing platforms, including multi-core and Graphics Processing Unit (GPU), were investigated and compared. The phase recovery algorithm is an iterative process requiring multiple Discrete Fourier Transform (DFT) computations each iteration. In order to achieve high-performance, it is necessary to use highly optimized fast Fourier transform (FFT) code to compute the DFTs. In our investigation, several FFT libraries, including FFTW, the Intel R Math Kernel Library (MKL), the CUFFT library for the NVIDIAR GPU, and the SPIRAL generated code, were used and compared. The SPIRAL system provides an extensible framework for generating and automatically optimizing implementations of DSP (digital signal processing) algorithms described using mathematical formulas, and is the most extensible of the platforms investigated here. The phas recovery algorithm intersperses FFT computations with point-wise computations, and while the FFTs are the dominant computation, the point-wise operations can have a signi cant impact on the overall performance. Therefore, simply relying on the performance of an optimized FFT library is insu cient to obtain optimal performance. Unlike the FFTW, MKL, and CUFFT libraries, the SPIRAL system allows the FFTs to be combined with the point-wise operations and the entire algorithm to be optimized. In this thesis, we obtained a mathematical formula representing the phase recovery algorithm that can be incorporated into the SPIRAL framework and utilize SPIRAL's parallel and vector code generation and optimization facilities. The SPIRAL code generated in this thesis is sequential. We estimate that with a vectorized and parallelized SPIRAL implementation, it is possible to obtain a 1.5-fold speedup for two-dimensional (2D) phase recovery and 1.88-fold speed up for 3D phase recovery over the MKL implementation.M.S., Computer Engineering -- Drexel University, 200
Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM
This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier
Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output
Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay
Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture,
while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency
design presented allows enhancing system throughput without requiring additional parallel data paths common in
other current approaches, the presented design can process two and four independent data streams in parallel
and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated
significant resource efficiency and high-throughput in comparison to relevant current approaches within
literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated
on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency
values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
- …