220 research outputs found

    Simultaneous floating-point sine and cosine for VLIW integer processors

    Get PDF
    Accepted for publication in the proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2012).International audienceGraphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. We describe software implementations for sinf, cosf, and sincosf over [-pi/4,pi/4] that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    DCT Implementation on GPU

    Get PDF
    There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform

    Exploring Computational Chemistry on Emerging Architectures

    Get PDF
    Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity

    Techniques and errors in measuring cross- correlation and cross-spectral density functions

    Get PDF
    Techniques and errors in measuring cross spectral density and cross correlation functions of stationary dynamic pressure dat

    Performance Comparison of 3D Sinc Interpolation for fMRI Motion Correction by Language of Implementation and Hardware Platform

    Get PDF
    Substantial effort is devoted to improving neuroimaging data processing; this effort however, is typically from the algorithmic perspective only. I demonstrate that substantive running time performance improvements to neuroscientific data processing algorithms can be realized by considering their implementation. Focusing specifically on 3D sinc interpolation, an algorithm used for processing functional magnetic resonance imaging (fMRI) data, I compare the performance of Python, C and OpenCL implementations of this algorithm across multiple hardware platforms. I also benchmark the performance of a novel implementation of 3D sinc interpolation on a field programmable gate array (FPGA). Together, these comparisons demonstrate that the performance of a neuroimaging data processing algorithm is significantly impacted by its implementation. I also present a case study demonstrating the practical benefits of improving a neuroscientific data processing algorithm\u27s implementation, then conclude by addressing threats to the validity of the study and discussing future directions

    Digital radar receiver design based on highly efficient bandpass sampling FPGA architecture

    Get PDF
    Thesis (M.S.)--University of Oklahoma, 2009.Includes bibliographical references (leaves 62-64).As digital electronics become faster and more efficient, it becomes possible to move the analog/digital interface in a radar downconversion system further towards the antenna. Instead of digitizing radar echoes at the end of the down-conversion process, digital logic can perform the same operations previously performed by analog components. Taking full advantage of this opportunity will result in a more highly integrated and reconfigurable design. By removing unnecessary analog components, the error from component variability and noise injected into the signal of interest is reduced, the size of the receiver and the power required for operation is minimized, and the overall cost of the system can be lowered. This research is focused on employing software defined radio concepts for weather observation, thus creating a low-cost digital radar receiver at the University of Oklahoma for use in radar projects as a way of obviating the need for commmercial radar receivers, which can be many times more expensive. Software-defined radio techniques, such as bandpass sampling, are used to achieve a high data processing bandwidth and oversampling ratio with the smallest logic resource utilization. Two novel digital receiver designs are discussed in this thesis. A prototype compact single-channel digital receiver based on a 14-bit analog-to-digital converter and a hand-solderable Xilinx FPGA was built and tested both in the laboratory and at the National Weather Radar Testbed (NWRT). Building on the lessons learned from testing the single-channel digital radar receiver, a second digital receiver was designed for expanded capabilities. Through the utilization of a low-power, simultaneous-sampling eight channel ADC with high-speed serial data links and a cost-efficient FPGA with integrated DSP slices, eight data channels can be digitized, processed and transferred at the same time in a compact form factor. An ethernet interface has been included which allows for a scalable control channel so that the digital receiver's operations can be quickly modified. This also makes it possible to remotely change the firmware of the FPGA in seconds, without the need for physical access. Development of host computer platforms to store and process each digital receiver's output are also discussed

    Evolvable hardware system for automatic optical inspection

    Get PDF
    corecore