8,740 research outputs found
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms
A Simple Multiprocessor Management System for Event-Parallel Computing
Offline software using TCP/IP sockets to distribute particle physics events
to multiple UNIX/RISC workstations is described. A modular, building block
approach was taken, which allowed tailoring to solve specific tasks efficiently
and simply as they arose. The modest, initial cost was having to learn about
sockets for interprocess communication. This multiprocessor management software
has been used to control the reconstruction of eight billion raw data events
from Fermilab Experiment E791.Comment: 10 pages, 3 figures, compressed Postscript, LaTeX. Submitted to NI
Hierarchical N-Body problem on graphics processor unit
Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems.
Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors.
Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors.
In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader
Sorting as a Streaming Application Executing on Chip Multiprocessors
Expressing concurrency in applications has always been a difficult and error-prone endeavor, yet effective utilization of multi-core processors requires that the concurrency in applications be understood. One approach to the expression of concurrency is streaming, which has shown real promise as a safe and effective method for many application classes. Here, we express a classic problem, sorting, in the streaming paradigm and explore the implications of various algorithm and architectural design parameters on the performance of the application
The physical world as a virtual reality: a prima facie case
This paper explores the idea that the universe is a virtual reality created by information
processing, and relates this strange idea to the findings of modern physics about the physical
world. The virtual reality concept is familiar to us from online worlds, but the world as a virtual
reality is usually a subject for science fiction rather than science. Yet logically the world could be
an information simulation running on a three-dimensional space-time screen. Indeed, that the
essence of the universe is information has advantages, e.g. if matter, charge, energy and
movement are aspects of information, the many conservation laws could become a single law of
conservation of information. If the universe were a virtual reality, its creation at the big bang
would no longer be paradoxical, as every virtual system must be booted up. It is suggested that
whether the world is an objective or a virtual reality is a matter for science to resolve, and
computer science could help. If one could derive core properties like space, time, light, matter and
movement from information processing, such a model could reconcile relativity and quantum
theories, with the former being how information processing creates space-time, and the latter how
it creates energy and matter
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP
This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs
- …