331 research outputs found
Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation
Detailed modeling of processors and high performance cycle-accurate
simulators are essential for today's hardware and software design. These
problems are challenging enough by themselves and have seen many previous
research efforts. Addressing both simultaneously is even more challenging, with
many existing approaches focusing on one over another. In this paper, we
propose the Reduced Colored Petri Net (RCPN) model that has two advantages:
first, it offers a very simple and intuitive way of modeling pipelined
processors; second, it can generate high performance cycle-accurate simulators.
RCPN benefits from all the useful features of Colored Petri Nets without
suffering from their exponential growth in complexity. RCPN processor models
are very intuitive since they are a mirror image of the processor pipeline
block diagram. Furthermore, in our experiments on the generated cycle-accurate
simulators for XScale and StrongArm processor models, we achieved an order of
magnitude (~15 times) speedup over the popular SimpleScalar ARM simulator.Comment: Submitted on behalf of EDAA (http://www.edaa.com/
Programming MPSoC platforms: Road works ahead
This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developerÂŽs viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud
deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy
Fast approximately timed simulation
International audienceIn this paper we present a technique for fast approximately timed simulation of software within a virtual prototyping framework. Our method performs a static analysis of the program control flow graph to construct annotations of the simulated program, combined with dynamic performance information. The static analysis estimates execution time based on a target architecture model. The delays introduced by instruction fetch and data cache misses are evaluated dynamically. At the end of each block, static and dynamic information are combined with branch target prediction to compute the total execution time of the blocks. As a result, we can provide approximate performance estimates with a high simulation speed that is still usable for software developers
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Modern processor architectures, in addition to having still more cores, also
require still more consideration to memory-layout in order to run at full
capacity. The usefulness of most languages is deprecating as their
abstractions, structures or objects are hard to map onto modern processor
architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB,
that enables vector oriented high-level programming languages to map onto a
broad range of architectures efficiently. The idea is to close the gap between
high-level languages and hardware optimized low-level implementations. By
translating high-level vector operations into an intermediate vector bytecode,
cphVB enables specialized vector engines to efficiently execute the vector
operations.
The primary success parameters are to maintain a complete abstraction from
low-level details and to provide efficient code execution across different,
modern, processors. We evaluate the presented design through a setup that
targets multi-core CPU architectures. We evaluate the performance of the
implementation using Python implementations of well-known algorithms: a jacobi
solver, a kNN search, a shallow water simulation and a synthetic stencil
simulation. All demonstrate good performance
Test exploration and validation using transaction level models
The complexity of the test infrastructure and test strategies in systems-on-chip approaches the complexity of the functional design space. This paper presents test design space exploration and validation of test strategies and schedules using transaction level models (TLMs). Since many aspects of testing involve the transfer of a significant amount of test stimuli and responses, the communication-centric view of TLMs suits this purpose exceptionally wel
Efficient Dual-ISA Support in a Retargetable, Asynchronous Dynamic Binary Translator
Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-inJust-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting
Recommended from our members
Core level thermal estimation techniques for early design space exploration
textThe primary objective of this thesis is to develop a methodology for fast, yet accurate temperature estimation during design space exploration. Power and temperature of modern day systems have become important metrics in addition to performance. Static and dynamic power dissipation leads to an increase in temperature, which creates cooling and packaging issues. Furthermore, the transient thermal profile determines temperature gradients, hotspots and thermal cycles. Traditional solutions rely on cycle-accurate simulations of detailed micro-architectural structures and are slow. The thesis shows that the periodic power estimation is the key bottleneck in such approaches. It also demonstrates an approach (FastSpot) that integrates accurate thermal estimation into existing host-compiled simulations. The developed methodology can incorporate different sampling-based thermal models. It achieves a 32000x increase in simulation throughput for temperature trace generation, while incurring low measurement errors (0.06 K- transient,0.014 K- steady-state) compared to a cycle-accurate reference method.Electrical and Computer Engineerin
Using Rapid Prototyping in Computer Architecture Design Laboratories
This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware
- âŠ