32 research outputs found
From High Level Architecture Descriptions to Fast Instruction Set Simulators
As computer systems become increasingly complex and diverse, so too do the architectures
they implement. This leads to an increase in complexity in the tools used to design
new hardware and software. One particularly important tool in hardware and software
design is the Instruction Set Simulator, which is used to prototype new architectures and
hardware features, verify hardware, and test and debug software. Many Architecture
Description Languages exist which facilitate the description of new architectural or
hardware features, and generate a tools such as simulators. However, these typically
suffer from poor performance, are difficult to test effectively, and may be limited in
functionality.
This thesis considers three objectives when developing Instruction Set Simulators:
performance, correctness, and completeness, and presents techniques which contribute
to each of these. Performance is obtained by combining Dynamic Binary Translation
techniques with a novel analysis of high level architecture descriptions. This makes use
of partial evaluation techniques in order to both improve the translation system, and to
improve the quality of the translated code, leading a performance improvement of over
2.5x compared to a naïve implementation.
This thesis also presents techniques which contribute to the correctness objective.
Each possible behaviour of each described instruction is used to guide the generation
of a test case. Constraint satisfaction techniques are used to determine the necessary
instruction encoding and context for each behaviour to be produced. It is shown that
this is a significant improvement over benchmark-driven testing, and this technique
has led to the discovery of several bugs and inconsistencies in multiple state of the art
instruction set simulators.
Finally, several challenges in ‘Full System’ simulation are addressed, contributing
to both the performance and completeness objectives. Full System simulation generally
carries significant performance costs compared with other simulation strategies. Crucially,
instructions which access memory require virtual to physical address translation
and can now cause exceptions. Both of these processes must be correctly and efficiently
handled by the simulator. This thesis presents novel techniques to address this issue
which provide up to a 1.65x speedup over a state of the art solution
Simulador para processadores de sinal digital de arquitectura VLIW
Engenharia Eletrónica e TelecomunicaçõesDissertação apresentada a Universidade de Aveiro para cumprimento dos requisitos necessários a obtenção do grau de Mestre em Engenharia Eletrónica e Telecomunicações, realizada sob a orientação científica do Professor Doutor Manuel Bernardo Salvador Cunha, Professor Auxiliar do Departamento de Eletrónica, Telecomunicações e Informática da Universidade de Aveiro e Doutor Mohamed Bamakhrama, Hardware Tools Engineer na equipa "Processor and Compiler Tools" no grupo "Imaging and Camera Technologies", Intel Eindhoven, Países Baixos.Dissertation presented to Universidade de Aveiro with the goal of achieving
a Master's Degree in Electronics and Telecommunications, made with the
scienti c orientation of Professor Manuel Bernardo Salvador Cunha PhD,
Professor at the Department of Electronic, Telecommunications and Informatics
from Universidade de Aveiro and Mohamed Bamakhrama, Hardware
Tools Engineer at Processor and Compiler Tools Team of Intel's Imaging
and Camera Technologies Group, Eindhoven
Efficient Dual-ISA Support in a Retargetable, Asynchronous Dynamic Binary Translator
Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-inJust-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting
Application specific instruction set processor design for embedded application using the coware tool
An Application Specific Instruction Set Processor (ASIP) is widely used as a System on a Chip(SoC) Component. ASIPs possess an instruction set which is tai-lored to benefit a specific application. Such specialization allows ASIPs to serve as an intermediate between two dominant processor design styles- ASICs which has high processing abilities at the cost of limited programmability and Programmable solu-tions such as FPGAs that provide programming exibility at the cost of less energy eficiency. In this dissertation the goal is to design ASIP, keeping in mind a temper-ature sensor system. The platform used for processor design is LISA 2.0 description language and processor designing environment from CoWare. Coware processor de-signer allows processor architecture to be defined at an abstract level and automatic generation of chain of software tools like assembler, linker and simulator for functional verification followed by RTL level description. RTL level description is used to gen-erate synthesized report of the design using RTL compiler and finally the layout is created using Cadence encounter
Efficient cross-architecture hardware virtualisation
Hardware virtualisation is the provision of an isolated virtual environment that
represents real physical hardware. It enables operating systems, or other system-level
software (the guest), to run unmodified in a “container” (the virtual machine)
that is isolated from the real machine (the host).
There are many use-cases for hardware virtualisation that span a wide-range
of end-users. For example, home-users wanting to run multiple operating systems
side-by-side (such as running a Windows® operating system inside an OS
X environment) will use virtualisation to accomplish this. In research and development
environments, developers building experimental software and hardware
want to prototype their designs quickly, and so will virtualise the platform
they are targeting to isolate it from their development workstation. Large-scale
computing environments employ virtualisation to consolidate hardware, enforce
application isolation, migrate existing servers or provision new servers.
However, the majority of these use-cases call for same-architecture virtualisation,
where the architecture of the guest and the host machines match—a situation
that can be accelerated by the hardware-assisted virtualisation extensions
present on modern processors. But, there is significant interest in virtualising
the hardware of different architectures on a host machine, especially in the
architectural research and development worlds.
Typically, the instruction set architecture of a guest platform will be different
to the host machine, e.g. an ARM guest on an x86 host will use an ARM instruction
set, whereas the host will be using the x86 instruction set. Therefore, to
enable this cross-architecture virtualisation, each guest instruction must be emulated
by the host CPU—a potentially costly operation. This thesis presents a
range of techniques for accelerating this instruction emulation, improving over
a state-of-the art instruction set simulator by 2:64x. But, emulation of the guest
platform’s instruction set is not enough for full hardware virtualisation. In fact,
this is just one challenge in a range of issues that must be considered. Specifically,
another challenge is efficiently handling the way external interrupts are
managed by the virtualisation system. This thesis shows that when employing
efficient instruction emulation techniques, it is not feasible to arbitrarily
divert control-flow without consideration being given to the state of the emulated
processor. Furthermore, it is shown that it is possible for the virtualisation
environment to behave incorrectly if particular care is not given to the point
at which control-flow is allowed to diverge. To solve this, a technique is developed
that maintains efficient instruction emulation, and correctly handles
external interrupt sources.
Finally, modern processors have built-in support for hardware virtualisation
in the form of instruction set extensions that enable the creation of an abstract
computing environment, indistinguishable from real hardware. These extensions
enable guest operating systems to run directly on the physical processor,
with minimal supervision from a hypervisor. However, these extensions are
geared towards same-architecture virtualisation, and as such are not immediately
well-suited for cross-architecture virtualisation. This thesis presents a
technique for exploiting these existing extensions, and using them in a cross-architecture
virtualisation setting, improving the performance of a novel cross-architecture
virtualisation hypervisor over state-of-the-art by 2:5x
Navigating the Landscape for Real-time Localisation and Mapping for Robotics, Virtual and Augmented Reality
Visual understanding of 3D environments in real-time, at low power, is a huge
computational challenge. Often referred to as SLAM (Simultaneous Localisation
and Mapping), it is central to applications spanning domestic and industrial
robotics, autonomous vehicles, virtual and augmented reality. This paper
describes the results of a major research effort to assemble the algorithms,
architectures, tools, and systems software needed to enable delivery of SLAM,
by supporting applications specialists in selecting and configuring the
appropriate algorithm and the appropriate hardware, and compilation pathway, to
meet their performance, accuracy, and energy consumption goals. The major
contributions we present are (1) tools and methodology for systematic
quantitative evaluation of SLAM algorithms, (2) automated,
machine-learning-guided exploration of the algorithmic and implementation
design space with respect to multiple objectives, (3) end-to-end simulation
tools to enable optimisation of heterogeneous, accelerated architectures for
the specific algorithmic requirements of the various SLAM algorithmic
approaches, and (4) tools for delivering, where appropriate, accelerated,
adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.Comment: Proceedings of the IEEE 201
Software test and evaluation study phase I and II : survey and analysis
Issued as Final report, Project no. G-36-661 (continues G-36-636; includes A-2568
Discrete Event Simulations
Considered by many authors as a technique for modelling stochastic, dynamic and discretely evolving systems, this technique has gained widespread acceptance among the practitioners who want to represent and improve complex systems. Since DES is a technique applied in incredibly different areas, this book reflects many different points of view about DES, thus, all authors describe how it is understood and applied within their context of work, providing an extensive understanding of what DES is. It can be said that the name of the book itself reflects the plurality that these points of view represent. The book embraces a number of topics covering theory, methods and applications to a wide range of sectors and problem areas that have been categorised into five groups. As well as the previously explained variety of points of view concerning DES, there is one additional thing to remark about this book: its richness when talking about actual data or actual data based analysis. When most academic areas are lacking application cases, roughly the half part of the chapters included in this book deal with actual problems or at least are based on actual data. Thus, the editor firmly believes that this book will be interesting for both beginners and practitioners in the area of DES
Applications of information sharing for code generation in process virtual machines
As the backbone of many computing environments today, it is important that process virtual
machines be both performant and robust in mobile, personal desktop, and enterprise applications.
This thesis focusses on code generation within these virtual machines, particularly
addressing situations where redundant work is being performed. The goal is to exploit information
sharing in order to improve the performance and robustness of virtual machines that are
accelerated by native code generation. First, the thesis investigates the potential to share generated
code between multiple threads in a dynamic binary translator used to perform instruction
set simulation. This is done through a code generation design that allows native code to be
executed by any simulated core and adding a mechanism to share native code regions between
threads. This is shown to improve the average performance of multi-threaded benchmarks by
1.4x when simulating 128 cores on a quad-core host machine. Secondly, the ahead-of-time
code generation system used for executing Android applications is improved through the use
of profiling. The thesis investigates the potential for profiles produced by individual users of
applications to be shared and merged together to produce a generic profile that still provides
a lot of benefit for a new user who is then able to skip the expensive profiling phase. These
profiles can not only be used for selective compilation to reduce code-size and installation
time, but can also be used for focussed optimisation on vital code regions of an application
in order to improve overall performance. With selective compilation applied to a set of popular
Android applications, code-size can be reduced by 49.9% on average, while installation
time can be reduced by 31.8%, with only an average 8.5% increase in the amount of sequential
runtime required to execute the collected profiles. The thesis also shows that, among the
tested users, the use of a crowd-sourced and merged profile does not significantly affect their
estimated performance loss from selective compilation (0.90x-0.92x) in comparison to when
they they perform selective compilation with their own unique profile (0.93x). Furthermore, by
proposing a new, more powerful code generator for Android’s virtual machine, these same profiles
can be used to perform focussed optimisation, which preliminary results show to increase
runtime performance across a set of common Android benchmarks by 1.46x-10.83x. Finally,
in such a situation where a new code generator is being added to a virtual machine, it is also
important to test the code generator for correctness and robustness. The methods of execution
of a virtual machine, such as interpreters and code generators, must share a set of semantics
about how programs must be executed, and this can be exploited in order to improve testing.
This is done through the application of domain-aware binary fuzzing and differential testing
within Android’s virtual machine. The thesis highlights a series of actual code generation and
verification bugs that were found in Android’s virtual machine using this testing methodology,
as well as comparing the proposed approach to other state-of-the-art fuzzing techniques