1,994 research outputs found
A methodology for analyzing commercial processor performance numbers
The wealth of performance numbers provided by benchmarking corporations makes it difficult to detect trends across commercial machines. A proposed methodology, based on statistical data analysis, simplifies exploration of these machines' large datasets
Hardware Accelerator of Cartesian Genetic Programming with Multiple Fitness Units
A new accelerator of Cartesian genetic programming is presented in this paper. The accelerator is completely implemented in a single FPGA. The proposed architecture contains multiple instances of virtual reconfigurable circuit to evaluate several candidate solutions in parallel. An advanced memory organization was developed to achieve the maximum throughput of processing. The search algorithm is implemented using the on-chip PowerPC processor. In the benchmark problem (image filter evolution) the proposed platform provides a significant speedup (170) in comparison with a highly optimized software implementation. Moreover, the accelerator is 8 times faster than previous FPGA accelerators of image filter evolution
Description and Optimization of Abstract Machines in a Dialect of Prolog
In order to achieve competitive performance, abstract machines for Prolog and
related languages end up being large and intricate, and incorporate
sophisticated optimizations, both at the design and at the implementation
levels. At the same time, efficiency considerations make it necessary to use
low-level languages in their implementation. This makes them laborious to code,
optimize, and, especially, maintain and extend. Writing the abstract machine
(and ancillary code) in a higher-level language can help tame this inherent
complexity. We show how the semantics of most basic components of an efficient
virtual machine for Prolog can be described using (a variant of) Prolog. These
descriptions are then compiled to C and assembled to build a complete bytecode
emulator. Thanks to the high level of the language used and its closeness to
Prolog, the abstract machine description can be manipulated using standard
Prolog compilation and optimization techniques with relative ease. We also show
how, by applying program transformations selectively, we obtain abstract
machine implementations whose performance can match and even exceed that of
state-of-the-art, highly-tuned, hand-crafted emulators.Comment: 56 pages, 46 figures, 5 tables, To appear in Theory and Practice of
Logic Programming (TPLP
Relaxed Operational Semantics of Concurrent Programming Languages
We propose a novel, operational framework to formally describe the semantics
of concurrent programs running within the context of a relaxed memory model.
Our framework features a "temporary store" where the memory operations issued
by the threads are recorded, in program order. A memory model then specifies
the conditions under which a pending operation from this sequence is allowed to
be globally performed, possibly out of order. The memory model also involves a
"write grain," accounting for architectures where a thread may read a write
that is not yet globally visible. Our formal model is supported by a software
simulator, allowing us to run litmus tests in our semantics.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244
Accelerating FPGA-based evolution of wavelet transform filters by optimized task scheduling
Adaptive embedded systems are required in various applications. This work addresses these needs in the
area of adaptive image compression in FPGA devices. A simplified version of an evolution strategy is utilized
to optimize wavelet filters of a Discrete Wavelet Transform algorithm. We propose an adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution. The proposed solution has been extensively evaluated in terms of the quality of compression as well as the processing time. The proposed architecture
reduces the time of evolution by 44% compared to our previous reports while maintaining the quality of compression unchanged with respect to existing implementations. The system is able to find an
optimized set of wavelet filters in less than 2 min whenever the input type of data changes
Re-Engineering of the CERN Accelerators and Services Control System based on VMEbus and PowerPC Technology
A new generation of PowerPC VMEbus front-end computers is being introduced in the CERN accelerators and services control system infrastructure. This new technology is aimed at replacing existing PC based front-end computers and at offering a high performance microprocessor platform for present and future engineering developments for the LHC era. This paper describes the re-engineering strategy and the core architecture of the new systems. Special performance issues are also addressed in this paper
PRISE: An Integrated Platform for Research and Teaching of Critical Embedded Systems
In this paper, we present PRISE, an integrated workbench for Research and Teaching of critical embedded systems at ISAE, the French Institute for Space and Aeronautics Engineering. PRISE is built around state-of-the-art technologies for the engineering of space and avionics systems used in Space and Avionics domain. It aims at demonstrating key aspects of critical, real-time, embedded systems used in the transport industry, but also validating new scientific contributions for the engineering of software functions. PRISE combines embedded and simulation platforms, and modeling tools. This platform is available for both research and teaching. Being built around widely used commercial and open source software; PRISE aims at being a reference platform for our teaching and research activities at ISAE
- …