Search CORE

18,734 research outputs found

Designing a CPU model: from a pseudo-formal document to fast code

Author: Blanqui Frédéric
Helmstetter Claude
Joloboff Vania
Monin Jean-François
Shi Xiaomu
Publication venue
Publication date: 22/01/2011
Field of study

For validating low level embedded software, engineers use simulators that take the real binary as input. Like the real hardware, these full-system simulators are organized as a set of components. The main component is the CPU simulator (ISS), because it is the usual bottleneck for the simulation speed, and its development is a long and repetitive task. Previous work showed that an ISS can be generated from an Architecture Description Language (ADL). In the work reported in this paper, we generate a CPU simulator directly from the pseudo-formal descriptions of the reference manual. For each instruction, we extract the information describing its behavior, its binary encoding, and its assembly syntax. Next, after automatically applying many optimizations on the extracted information, we generate a SystemC/TLM ISS. We also generate tests for the decoder and a formal specification in Coq. Experiments show that the generated ISS is as fast and stable as our previous hand-written ISS.Comment: 3rd Workshop on: Rapid Simulation and Performance Evaluation: Methods and Tools (2011

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-CIRAD

HAL-Rennes 1

A GPU-based hyperbolic SVD algorithm

Author: A.H. Sameh
F.T. Luk
F.T. Luk
G.S. Sachdev
H. Zha
I. Slapničar
I. Slapničar
I. Slapničar
J.R. Bunch
K. Veselić
R. Mathias
R.P. Brent
S. Lahabar
S. Singer
S. Singer
S. Singer
S. Zhang
Sanja Singer
V. Hari
V. Hari
Vedran Novaković
Z. Drmač
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are discussed.Comment: Accepted for publication in BIT Numerical Mathematic

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Author: Anderson
Anderson
Asadchev
Badal
Birdsall
Birdsall
Conte
Denis Eremin
Harvey
Hockney
Kersevan
Marsaglia
Peter Awakowicz
Phelps
Phelps
Philipp Mertmann
Rajasekaran
Ralf Peter Brinkmann
Stantchev
Tajima
Thomas Mussenbrock
Tskhakaya
Turner
Verboncoeur
Verboncoeur
Yanguas-Gil
Publication venue: 'Elsevier BV'
Publication date: 20/04/2011
Field of study

Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently of each other within one time step. Current graphics processing units (GPUs) offer an attractive means for execution of the parallelized code. In this contribution we show a one-dimensional PIC code running on Nvidia GPUs using the CUDA environment. A distinctive feature of the code is that size of the cells that the code uses to sort the particles with respect to their coordinates is comparable to size of the grid cells used for discretization of the electric field. Hence, we call the corresponding algorithm "fine-sorting". Implementation details and optimization of the code are discussed and the speed-up compared to classical CPU approaches is computed

arXiv.org e-Print Archive

Crossref

Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures

Author: Bryson David M.
Ofria Charles
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/10/2013
Field of study

We investigate fundamental decisions in the design of instruction set architectures for linear genetic programs that are used as both model systems in evolutionary biology and underlying solution representations in evolutionary computation. We subjected digital organisms with each tested architecture to seven different computational environments designed to present a range of evolutionary challenges. Our goal was to engineer a general purpose architecture that would be effective under a broad range of evolutionary conditions. We evaluated six different types of architectural features for the virtual CPUs: (1) genetic flexibility: we allowed digital organisms to more precisely modify the function of genetic instructions, (2) memory: we provided an increased number of registers in the virtual CPUs, (3) decoupled sensors and actuators: we separated input and output operations to enable greater control over data flow. We also tested a variety of methods to regulate expression: (4) explicit labels that allow programs to dynamically refer to specific genome positions, (5) position-relative search instructions, and (6) multiple new flow control instructions, including conditionals and jumps. Each of these features also adds complication to the instruction set and risks slowing evolution due to epistatic interactions. Two features (multiple argument specification and separated I/O) demonstrated substantial improvements int the majority of test environments. Some of the remaining tested modifications were detrimental, thought most exhibit no systematic effects on evolutionary potential, highlighting the robustness of digital evolution. Combined, these observations enhance our understanding of how instruction architecture impacts evolutionary potential, enabling the creation of architectures that support more rapid evolution of complex solutions to a broad range of challenges

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM

Author: Grass Eckhard
Jagdhold Ulrich
Maharatna Koushik
Publication venue
Publication date: 01/03/2004
Field of study

In this article, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in the OFDM based IEEE 802.11a Wireless LAN (WLAN) baseband processor. The 64-point FFT is realized by decomposing it into a 2-D structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use any 2-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25 ?m BiCMOS technology. The core area of this chip is 6.8 mm2. The average dynamic power consumption is 41 mW @ 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i. e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption

Southampton (e-Prints Soton)

Explore Bristol Research

B-LOG: A branch and bound methodology for the parallel execution of logic programs

Author: Hermenegildo Manuel V.
Lipovski Gerald John
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/1985
Field of study

We propose a computational methodology -"B-LOG"-, which offers the potential for an effective implementation of Logic Programming in a parallel computer. We also propose a weighting scheme to guide the search process through the graph and we apply the concepts of parallel "branch and bound" algorithms in order to perform a "best-first" search using an information theoretic bound. The concept of "session" is used to speed up the search process in a succession of similar queries. Within a session, we strongly modify the bounds in a local database, while bounds kept in a global database are weakly modified to provide a better initial condition for other sessions. We also propose an implementation scheme based on a database machine using "semantic paging", and the "B-LOG processor" based on a scoreboard driven controller

Archivo Digital UPM

On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report

Author: Redman Wayne
Shyy Dong-Jye
Publication venue
Publication date
Field of study

For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices

NASA Technical Reports Server