17 research outputs found
Decompiler For Pseudo Code Generation
Decompiling is an area of interest for researchers in the field of software reverse engineering. When the source code from a high-level programming language is compiled, it loses a great deal of information, including code structure, syntax, and punctuation.The purpose of this research is to develop an algorithm that can efficiently decompile assembly language into pseudo C code. There are tools available that claim to extract high-level code from an executable file, but the results of these tools tend to be inaccurate and unreadable.Our proposed algorithm can decompile assembly code to recover many basic high-level programming structures, including if/else, loops, switches, and math instructions. The approach adopted here is different from that of existing tools. Our algorithm performs three passes through the assembly code, and includes a virtual execution of each assembly instruction. We also construct a dependency graph and incidence list to aid in the decompilation
Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator
Abstract. Instruction set simulators (ISS) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (JIT) dynamic binary translation (DBT) techniques are able to simulate complex embedded processors at speeds above 500 MIPS. However, these functional ISS do not provide microarchitectural observability. In contrast, low-level cycle-accurate ISS are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the JIT DBT engine of our ISS and augment JIT generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact TM instruction set architecture (ISA). We achieve simulation speeds up to 88 MIPS on a standard x86 desktop computer for the industry standard EEMBC, COREMARK and BIOPERF benchmark suites.
SimBench: A Portable Benchmarking Methodology for Full-System Simulators
We acknowledge funding by the EPSRC grant PAMELA EP/K008730/1.Full-system simulators are increasingly finding their way into the consumer space for the purposes of backwards compatibility and hardware emulation (e.g. for games consoles). For such compute-intensive applications simulation performance is paramount. In this paper we argue that existing benchmarksuites such as SPEC CPU2006, originally designed for architecture and compiler performance evaluation, are not well suited for the identification of performance bottlenecks in full-system simulators. While their large, complex workloads provide an indication as to the performance of the simulator on ‘real-world’ workloads, this does not give any indication of why a particular simulator might run an application faster or slower than another. In this paper we present SimBench, an extensive suite of targeted micro-benchmarks designed to run bare-metal on a fullsystem simulator. SimBench exercises dynamic binary translation (DBT) performance, interrupt and exception handling, memoryaccess performance, I/O and other performance-sensitive areas. SimBench is cross-platform benchmarking framework and can be retargeted to new architectures with minimal effort. For several simulators, including QEMU, Gem5 and SimIt-ARM, and targeting ARM and Intel x86 architectures, we demonstrate that SimBench is capable of accurately pinpointing and explaining real-world performance anomalies, which are largely obfuscated by existing application-oriented benchmarks.Postprin
Ambiente de suporte ao projeto de sistemas embarcados
Orientador: Roberto André HexselInclui apendiceDissertaçao (mestrado) - Universidade Federal do Paraná, Setor de Ciencias Exatas, Programa de Pós-Graduaçao em Informática. Defesa: Curitiba, 2006Inclui bibliografi
Embedded Processor Selection/Performance Estimation using FPGA-based Profiling
In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time
From High Level Architecture Descriptions to Fast Instruction Set Simulators
As computer systems become increasingly complex and diverse, so too do the architectures
they implement. This leads to an increase in complexity in the tools used to design
new hardware and software. One particularly important tool in hardware and software
design is the Instruction Set Simulator, which is used to prototype new architectures and
hardware features, verify hardware, and test and debug software. Many Architecture
Description Languages exist which facilitate the description of new architectural or
hardware features, and generate a tools such as simulators. However, these typically
suffer from poor performance, are difficult to test effectively, and may be limited in
functionality.
This thesis considers three objectives when developing Instruction Set Simulators:
performance, correctness, and completeness, and presents techniques which contribute
to each of these. Performance is obtained by combining Dynamic Binary Translation
techniques with a novel analysis of high level architecture descriptions. This makes use
of partial evaluation techniques in order to both improve the translation system, and to
improve the quality of the translated code, leading a performance improvement of over
2.5x compared to a naïve implementation.
This thesis also presents techniques which contribute to the correctness objective.
Each possible behaviour of each described instruction is used to guide the generation
of a test case. Constraint satisfaction techniques are used to determine the necessary
instruction encoding and context for each behaviour to be produced. It is shown that
this is a significant improvement over benchmark-driven testing, and this technique
has led to the discovery of several bugs and inconsistencies in multiple state of the art
instruction set simulators.
Finally, several challenges in ‘Full System’ simulation are addressed, contributing
to both the performance and completeness objectives. Full System simulation generally
carries significant performance costs compared with other simulation strategies. Crucially,
instructions which access memory require virtual to physical address translation
and can now cause exceptions. Both of these processes must be correctly and efficiently
handled by the simulator. This thesis presents novel techniques to address this issue
which provide up to a 1.65x speedup over a state of the art solution
Handling Information and its Propagation to Engineer Complex Embedded Systems
Avec l’intérêt que la technologie d’aujourd’hui a sur les données, il est facile de supposer que l’information est au bout des doigts, prêt à être exploité. Les méthodologies et outils de recherche sont souvent construits sur cette hypothèse. Cependant, cette illusion d’abondance
se brise souvent lorsqu’on tente de transférer des techniques existantes à des applications industrielles.
Par exemple, la recherche a produit divers méthodologies permettant d’optimiser l’utilisation des ressources de grands systèmes complexes, tels que les avioniques de l’Airbus A380. Ces approches nécessitent la connaissance de certaines mesures telles que les temps d’exécution, la consommation de mémoire, critères de communication, etc. La conception de ces systèmes complexes a toutefois employé une combinaison de compétences de différents domaines (probablement avec des connaissances en génie logiciel) qui font que les données caractéristiques au système sont incomplètes ou manquantes. De plus, l’absence d’informations
pertinentes rend difficile de décrire correctement le système, de prédire son comportement, et améliorer ses performances. Nous faisons recours au modèles probabilistes et des techniques d’apprentissage automatique pour remédier à ce manque d’informations pertinentes. La théorie des probabilités, en particulier, a un grand potentiel pour décrire les systèmes partiellement observables. Notre objectif est de fournir des approches et des solutions pour produire des informations pertinentes. Cela permet une description appropriée des systèmes complexes pour faciliter l’intégration, et permet l’utilisation des techniques d’optimisation existantes. Notre première étape consiste à résoudre l’une des difficultés rencontrées lors de l’intégration de système : assurer le bon comportement temporelle des composants critiques des systèmes. En raison de la mise à l’échelle de la technologie et de la dépendance croissante à l’égard des architectures à multi-coeurs, la surcharge de logiciels fonctionnant sur différents coeurs et le partage d’espace mémoire n’est plus négligeable. Pour tel, nous étendons la boîte à outils des système temps réel avec une analyse temporelle probabiliste statique qui estime avec précision l’exécution d’un logiciel avec des considerations pour les conflits de mémoire partagée. Le
modèle est ensuite intégré dans un simulateur pour l’ordonnancement de systèmes temps réel multiprocesseurs. ----------ABSTRACT: In today’s data-driven technology, it is easy to assume that information is at the tip of our fingers, ready to be exploited. Research methodologies and tools are often built on top of this assumption. However, this illusion of abundance often breaks when attempting
to transfer existing techniques to industrial applications. For instance, research produced various methodologies to optimize the resource usage of large complex systems, such as the avionics of the Airbus A380. These approaches require the knowledge of certain metrics such as the execution time, memory consumption, communication delays, etc. The design of these complex systems, however, employs a mix of expertise from different fields (likely with limited knowledge in software engineering) which might lead to incomplete or missing specifications. Moreover, the unavailability of relevant information makes it difficult to properly describe
the system, predict its behavior, and improve its performance. We fall back on probabilistic models and machine learning techniques to address this lack of
relevant information. Probability theory, especially, has great potential to describe partiallyobservable systems. Our objective is to provide approaches and solutions to produce relevant information. This enables a proper description of complex systems to ease integration, and allows the use of existing optimization techniques. Our first step is to tackle one of the difficulties encountered during system integration: ensuring the proper timing behavior of critical systems. Due to technology scaling, and with the growing reliance on multi-core architectures, the overhead of software running on different cores and sharing memory space is no longer negligible. For such, we extend the real-time
system tool-kit with a static probabilistic timing analysis technique that accurately estimates the execution of software with an awareness of shared memory contention. The model is then incorporated into a simulator for scheduling multi-processor real-time systems
MiADL : linguagem para geração automática de simuladores redireccionáveis
Tese de Doutoramento em Electrónica Industrial - Área de Conhecimento em Informática IndustrialOs sistemas embutidos, além de fazerem cada vez mais parte da vida do cidadão
comum, são dispositivos cada vez mais sofisticados e complexos. As equipas de
desenvolvimento de sistemas embutidos têm por isso de lidar com complexidade
crescente para atingir performances e requisitos cada vez mais exigentes (ex. hierarquias
de memória complexas). Estes dispositivos exigem ferramentas de software – tais
como: simuladores, depuradores, assembladores ou compiladores – que têm que
acompanhar a evolução dos mesmos. Torna-se por isso fundamental desenvolver de
forma rápida o conjunto de ferramentas para determinado sistema embutido para
garantir espaço num mercado bastante competitivo, ou seja, conseguir um curto time-tomarket.
O desenvolvimento de aplicações que permitam a rápida e eficaz geração deste
tipo de ferramentas assume relevo no âmbito do desenvolvimento dos sistemas
embutidos.
Das ferramentas de software, o simulador é uma das essenciais para o
desenvolvimento de novas arquitecturas computacionais. Entre as vantagens que
apresenta, destaca-se a flexibilidade e baixo custo, uma vez que permite simular
hardware em estágios iniciais do processo de desenvolvimento, sem necessidade de
existência física do mesmo. Os primeiros simuladores eram desenvolvidos
manualmente. Entretanto têm emergido linguagens de descrição de arquitecturas
(ADLs) que facilitam a geração dessas ferramentas de uma forma automática, rápida,
redireccionável e menos propensa a erros. Uma das vantagens destas linguagens
consiste em permitirem gerar várias ferramentas a partir de uma única descrição, o que
garante desde logo compatibilidade e coerência entre elas. Estas linguagens além de
aplicação prática para fins industriais, podem também ser usadas para fins educacionais
nomeadamente para o ensino de arquitecturas de microprocessadores.
Este trabalho de Doutoramento tem por objectivo contribuir para simplificar o
processo de construção de simuladores, usados no projecto de sistemas embutidos.
Pretende-se mostrar que a melhor forma de alcançar este objectivo consiste numa
abordagem usando uma linguagem estruturada e que explora os comportamentos/sintaxes comuns ao conjunto de instruções da arquitectura alvo. Para
suportar esta abordagem propõem-se uma linguagem que introduz uma forma diferente
de descrever arquitecturas do conjunto de instruções (ISAs). A linguagem nomeada de
Minho Architecture Description Language (MiADL), possui uma estrutura que explora
o que é comum às instruções (comportamento e assembly) e permite a existência de
blocos que são descritos uma vez e podem ser usados várias vezes. Desta forma a
validação e identificação de incoerências fica facilitada e conduz a descrições claras,
robustas e fáceis de depurar. As principais características desta linguagem são a
existência de scopes, a inferência de argumentos, a estrutura em secções que permitem
reutilização em diferentes partes da linguagem, a forma como lida com variabilidades e
regularidades presentes em ISAs e a relação com a informação presente geralmente em
manuais dos processadores. As modelações usando MiADL de vários ISAs complexos,
de arquitecturas conhecidas, resulta em descrições bastante compactas, estruturadas e
fáceis de explorar, quando comparadas com as conseguidas por outras ADLs.
Em termos de simulação, com base em modelações MiADL, na tese é apresentada a
infra-estrutura de geração automática de simuladores redireccionáveis de ISAs usando a
técnica de simulação compilada estática, recorrendo a generic programming. Esta
possibilidade de recorrer àquela técnica de programação deve-se às características da
linguagem, que com outras ADLs não é tão fácil de conseguir. Além de tornar o código
mais compacto e estruturado, permite que o compilador nativo explore optimizações
para conseguir bom desempenho do simulador.Embedded system devices, besides having a growing importance in the common
citizen’s life, are more and more sophisticated and complex devices. The embedded
systems development teams have to deal with the rising complexity to provide higher
performance and more demanding requisites (ex. complex memory organizations).
These devices require software tools – such as: simulator, debugger, linker, assembler
or compiler - that must be proper to those devices. It is therefore essential to develop in
a quick way the software toolkit, for the target embedded system, to assure place in a
competitive market, in other words, to shorten time-to-market. The development of
applications to efficiently generate the software toolkit is of great importance in the
embedded systems development scenario.
From the set of software tools, the simulator is an essential tool for the development
of new computational architectures. It brings flexibility and reduces the cost, since it
allows the hardware simulation in early stages of the development process, without the
physical existence of such hardware. The first simulators were hand-coded developed.
Since then, new Architecture Description Languages (ADLs) have emerged, that have
made easier the generation of these tools in an automatic, quicker, retargetable and error
resilient way. One advantage of these languages is that they allow generating several
tools from a single description. These languages, besides the practical applications for
industrial purposes, can also be used for educational purposes namely for teaching
microprocessors architectures.
The purpose of this PhD work is to contribute for the simplification of generating
simulators to be used in the design of embedded systems. The aim is to demonstrate that
the best way is achieved through the use of a structured language which explores the
common behavioural/syntaxes of the target instruction set architecture. To support this
approach a new language is proposed which introduces a new way to describe
Instructions Set Architectures (ISAs). This language, named Minho Architecture
Description Language (MiADL), possesses a structure that explores what is common
with the instructions (behaviour and assembly) and is block organized. The language blocks can be described once and used several times. In this way the validation and
detection of incoherencies is facilitated, leading to clear, robust and easy to debug
descriptions. The main features of this language are: scopes existence; arguments
inference; structure organized in sections which allows its reuse in different parts of the
description; the way in which it deals with variability and regularities present in
complex ISAs and similarity with the information usually available in processors
manuals. The descriptions of several complex ISAs of known architectures using
MiADL, result in rather compact, structured and easy to explore descriptions, when
compared with the ones obtained by other ADLs.
In this thesis is also discussed the retargetable framework designed for automatic
generation of compiled simulators, from MiADL descriptions, using generic
programming to execute instruction behaviors. The use of function templates is possible
due to MiADL features, with other ADLs this isn’t so easy to accomplish. Besides
making the code more compact and structured, allows to explore optimizations from the
native compiler to achieve improvements in the simulator´s performace.Ministério da Educação - Prodep II