Search CORE

2,529 research outputs found

Recommended from our members

Exploiting iteration-level parallelism in dataflow programs

Author: Bic Lubomir
Nagel Mark
Roy John M.A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven model of computation, a functional/declarative programming language, and a special-purpose multiprocessor architecture. In this paper we decouple the language and architecture issues by demonstrating that declarative programming is a suitable vehicle for the programming of conventional distributed-memory multiprocessors.This is achieved by appling several transformations to the compiled declarative program to achieve iteration-level (rather than instruction-level) parallelism. The transformations first group individual instructions into sequential light-weight processes, and then insert primitives to: (1) cause array allocation to be distributed over multiple processors, (2) cause computation to follow the data distribution by inserting an index filtering mechanism into a given loop and spawning a copy of it on all PEs; the filter causes each instance of that loop to operate on a different subrange of the index variable.The underlying model of computation is a dataflow/von Neumann hybrid in that exection within a process is control-driven while the creation, blocking, and activation of processes is data-driven.The performance of this process-oriented dataflow system (PODS) is demonstrated using the hydrodynamics simulation benchmark called SIMPLE, where a 19-fold speedup on a 32-processor architecture has been achieved

eScholarship - University of California

Multiprocessor vision system.

Author: Alexiou G.A.
Bourbakis N.G.
Papazoglou M.
Publication venue
Publication date
Field of study

Research Papers in Economics

34th Midwest Symposium on Circuits and Systems-Final Program

Author
Publication venue
Publication date: 01/05/1991
Field of study

Organized by the Naval Postgraduate School Monterey California. Cosponsored by the IEEE Circuits and Systems Society. Symposium Organizing Committee: General Chairman-Sherif Michael, Technical Program-Roberto Cristi, Publications-Michael Soderstrand, Special Sessions- Charles W. Therrien, Publicity: Jeffrey Burl, Finance: Ralph Hippenstiel, and Local Arrangements: Barbara Cristi

Calhoun, Institutional Archive of the Naval Postgraduate School

Graphics Processing Unit Bloom Filters: Classical and Probabilistic

Author: Pyle Joshua Michael
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2014
Field of study

Graphics Processing Units (GPUs) have been used to enhance the speed and efficiency of both data structures and algorithms alike. A common data structure used in Computer Science is the Bloom Filter, which is used in many types of applications including databases and security logging. The Bloom Filter is a lossy data structure that uses several hash functions to store keys into a bit array. A novel, new Bloom Filter meant for use in internet traffic detection called the Probabilistic Bloom Filter has recently been developed. In practice, this new Bloom Filter typically makes use of more hash functions than its classical counterpart. Because both of these data structures contain information that can be inserted in independent batch operations, this makes each data structure a prime target to be parallelized on a Graphics Processing Unit. This paper develops a scalable, optimized Graphics Processing Unit implementation of the classical and Probabilistic Bloom Filters. The results of processing the Bloom Filter on the Graphics Processing Unit (GPU) are compared to processing the same Bloom Filter on the Central Processing Unit (CPU). By processing the data structures on Graphics Processing Units, a substantial decrease in processing time was observed and recorded. For most cases, the decrease in time was linearly proportional to the number of keys inserted and the number of hash functions used

University of Tennessee, Knoxville: Trace

High-performance SIMT code generation in an active visual effects library

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Crossref

A metadata-enhanced framework for high performance visual effects

Author: Cornwall Jay L. T.
Cornwall Jay L. T.
Publication venue: Computing, Imperial College London
Publication date: 01/08/2010
Field of study

This thesis is devoted to reducing the interactive latency of image processing computations in visual effects. Film and television graphic artists depend upon low-latency feedback to receive a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising compiler which leverages high-level program metadata to guide key computational and memory hierarchy optimisations. This metadata encodes static and dynamic information about data dependence and patterns of memory access in the algorithms constituting a visual effect – features that are typically difficult to extract through program analysis – and presents it to the compiler in an explicit form. By using domain-specific information as a substitute for program analysis, our compiler is able to target a set of complex source-level optimisations that a vendor compiler does not attempt, before passing the optimised source to the vendor compiler for lower-level optimisation. Three key metadata-supported optimisations are presented. The first is an adaptation of space and schedule optimisation – based upon well-known compositions of the loop fusion and array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised visual effect. This adaptation sidesteps the costly solution of runtime code generation by specialising static parameters in an offline process and exploiting dynamic metadata to adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second optimisation comprises a set of transformations to generate SIMD ISA-augmented source code. Our approach differs from autovectorisation by using static metadata to identify parallelism, in place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable parameters for optimal aligned memory access. The third optimisation comprises a related set of transformations to generate code for SIMT architectures, such as GPUs. Static dependence metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads. Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying inter-thread and intra-core data sharing opportunities in memory access metadata. A detailed performance analysis of these optimisations is presented for two industrially developed visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations of these two effects. Programmability is enhanced by automating the generation of SIMD and SIMT implementations from a single programmer-managed scalar representation

Spiral - Imperial College Digital Repository

Energy efficient transport technology: Program summary and bibliography

Author: Bartlett D. W.
Hood R. V.
Middleton D. B.
Publication venue
Publication date
Field of study

The Energy Efficient Transport (EET) Program began in 1976 as an element of the NASA Aircraft Energy Efficiency (ACEE) Program. The EET Program and the results of various applications of advanced aerodynamics and active controls technology (ACT) as applicable to future subsonic transport aircraft are discussed. Advanced aerodynamics research areas included high aspect ratio supercritical wings, winglets, advanced high lift devices, natural laminar flow airfoils, hybrid laminar flow control, nacelle aerodynamic and inertial loads, propulsion/airframe integration (e.g., long duct nacelles) and wing and empennage surface coatings. In depth analytical/trade studies, numerous wind tunnel tests, and several flight tests were conducted. Improved computational methodology was also developed. The active control functions considered were maneuver load control, gust load alleviation, flutter mode control, angle of attack limiting, and pitch augmented stability. Current and advanced active control laws were synthesized and alternative control system architectures were developed and analyzed. Integrated application and fly by wire implementation of the active control functions were design requirements in one major subprogram. Additional EET research included interdisciplinary technology applications, integrated energy management, handling qualities investigations, reliability calculations, and economic evaluations related to fuel savings and cost of ownership of the selected improvements

NASA Technical Reports Server

Low power digital signal processing

Author: Paker Ozgun
Publication venue: Technical University of Denmark
Publication date: 01/01/2003
Field of study

Online Research Database In Technology

Image and video processing using graphics hardware

Author: Lanes Børge
Publication venue: University of Tromsø
Publication date: 01/01/2010
Field of study

Graphic Processing Units have during the recent years evolved into inexpensive high-performance many-core computing units. Earlier being accessible only by graphic APIs, new hardware architectures and programming tools have made it possible to program these devices using arbitrary data types and standard languages like C. This thesis investigates the development process and performance of image and video processing algorithms on graphic processing units, regardless of vendors. The tool used for programming the graphic processing units is OpenCL, a rela- tively new specification for heterogenous computing. Two image algorithms are investigated, bilateral filter and histogram. In addition, an attempt have been tried to make a template-based solution for generation and auto-optimalization of device code, but this approach seemed to have some shortcomings to be usable enough at this time

Munin - Open Research Archive

NORA - Norwegian Open Research Archives