112 research outputs found
Automatic C Compiler Generation from Architecture Description Language ISAC
This paper deals with retargetable compiler generation. After an introduction to application-specific instruction set processor design and a review of code generation in compiler backends, ISAC architecture description language is introduced. Automatic approach to instruction semantics extraction from ISAC models which result is usable for backend generation is presented.
This approach was successfully tested on three models of MIPS, ARM and TI MSP430 architectures. Further backend generation process that uses extracted instruction is semantics presented.
This process was currently tested on the MIPS architecture and some preliminary results are shown
A formally verified compiler back-end
This article describes the development and formal verification (proof of
semantic preservation) of a compiler back-end from Cminor (a simple imperative
intermediate language) to PowerPC assembly code, using the Coq proof assistant
both for programming the compiler and for proving its correctness. Such a
verified compiler is useful in the context of formal methods applied to the
certification of critical software: the verification of the compiler guarantees
that the safety properties proved on the source code hold for the executable
compiled code as well
Vector Operation Support for Transport Triggered Architectures
High performance and low power consumption requirements usually restrict the design process of embedded processors. Traditional design solutions do not apply to the requirements today, but instead demands exploiting varying levels of parallelism. In order to reduce design time and effort, a powerful toolset is required to design new parallel processors effectively.
TTA-based Co-design Environment (TCE) is a toolset developed in Tampere University of Technology for designing customized parallel processors. It is based on a modular Transport Triggered Architecture (TTA) processor architecture template, which provides easy customization and allows exploiting instruction-level parallelism for high performance execution.
Single Instruction, Multiple Data (SIMD) paradigm provides powerful data-level parallel vector computation for many applications in embedded processing. It is one of the most common ways to exploit parallelism in today's processor designs in order to gain greater execution efficiency and, therefore, to meet the performance requirements.
This work describes how data-level parallel SIMD support is introduced and integrated to the TCE design flow for more diverse parallelism support. The support allows designers to customize and program processors with wide vector operations. The work presents the required modification points along with the new tools that were added to the toolset. Much weight is given for the retargetable compiler, which must be able to adapt to all resources on TTA machines. The added tools were required to provide as much automatic behavior as possible to maintain effective design flow. In addition, the thesis presents how the modifications and new features were verified
Higher levels of process synchronisation
Four new synchronisation primitives (SEMAPHOREs, RESOURCEs, EVENTs and BUCKETs) were introduced in the KRoC 0.8beta release of occam for SPARC (SunOS/Solaris) and Alpha (OSF/1) UNIX workstations [1][2][3]. This paper reports on the rationale, application and implementation of two of these (SEMAPHOREs and EVENTs). Details on the other two may be found on the web [4]. The new primitives are designed to support higher-level mechanisms of SHARING between parallel processes and give us greater powers of expression. They will also let greater levels of concurrency be safely exploited from future parallel architectures, such as those providing (virtual) shared-memory. They demonstrate that occam is neutral in any debate between the merits of message-passing versus shared-memory parallelism, enabling applications to take advantage of whichever paradigm (or mixture of paradigms) is the most appropriate. The new primitives could be (but are not) implemented in terms of traditional channels, but only at the expense of increased complexity and computational overhead. The primitives are immediately useful even for uni-processors - for example, the cost of a fair ALT can be reduced from O(n) to O(1). In fact, all the operations associated with new primitives have constant space and time complexities; and the constants are very low. The KRoC release provides an Abstract Data Type interface to the primitives. However, direct use of such mechanisms still allows the user to misuse them. They must be used in the ways prescribed (in this paper and in [4]) else their semantics become unpredictable. No tool is provided to check correct usage at this level. The intention is to bind those primitives found to be useful into higher level versions of occam. Some of the primitives (e.g. SEMAPHOREs) may never themselves be made visible in the language, but may be used to implement bindings of higher-level paradigms (such as SHARED channels and BLACKBOARDs). The compiler will perform the relevant usage checking on all new language bindings, closing the security loopholes opened by raw use of the primitives. The paper closes by relating this work with the notions of virtual transputers, microcoded schedulers, object orientation and Java threads
A compiler approach to scalable concurrent program design
The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to
separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined
abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The
transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same
transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support.
The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This
toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory
multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial
applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics
Analysis and transformation of legacy code
Hardware evolves faster than software. While a hardware system might need replacement
every one to five years, the average lifespan of a software system is a decade,
with some instances living up to several decades. Inevitably, code outlives the platform
it was developed for and may become legacy: development of the software stops,
but maintenance has to continue to keep up with the evolving ecosystem. No new features
are added, but the software is still used to fulfil its original purpose. Even in the
cases where it is still functional (which discourages its replacement), legacy code is
inefficient, costly to maintain, and a risk to security.
This thesis proposes methods to leverage the expertise put in the development of
legacy code and to extend its useful lifespan, rather than to throw it away. A novel
methodology is proposed, for automatically exploiting platform specific optimisations
when retargeting a program to another platform. The key idea is to leverage the optimisation
information embedded in vector processing intrinsic functions. The performance
of the resulting code is shown to be close to the performance of manually
retargeted programs, however with the human labour removed.
Building on top of that, the question of discovering optimisation information when
there are no hints in the form of intrinsics or annotations is investigated. This thesis
postulates that such information can potentially be extracted from profiling the data
flow during executions of the program. A context-aware data dependence profiling
system is described, detailing previously overlooked aspects in related research. The
system is shown to be essential in surpassing the information that can be inferred statically,
in particular about loop iterators.
Loop iterators are the controlling part of a loop. This thesis describes and evaluates
a system for extracting the loop iterators in a program. It is found to significantly
outperform previously known techniques and further increases the amount of information
about the structure of a program that is available to a compiler. Combining this
system with data dependence profiling improves its results even more. Loop iterator
recognition enables other code modernising techniques, like source code rejuvenation
and commutativity analysis. The former increases the use of idiomatic code and as
a result increases the maintainability of the program. The latter can potentially drive
parallelisation and thus dramatically improve runtime performance
HardScope: Thwarting DOP with Hardware-assisted Run-time Scope Enforcement
Widespread use of memory unsafe programming languages (e.g., C and C++)
leaves many systems vulnerable to memory corruption attacks. A variety of
defenses have been proposed to mitigate attacks that exploit memory errors to
hijack the control flow of the code at run-time, e.g., (fine-grained)
randomization or Control Flow Integrity. However, recent work on data-oriented
programming (DOP) demonstrated highly expressive (Turing-complete) attacks,
even in the presence of these state-of-the-art defenses. Although multiple
real-world DOP attacks have been demonstrated, no efficient defenses are yet
available. We propose run-time scope enforcement (RSE), a novel approach
designed to efficiently mitigate all currently known DOP attacks by enforcing
compile-time memory safety constraints (e.g., variable visibility rules) at
run-time. We present HardScope, a proof-of-concept implementation of
hardware-assisted RSE for the new RISC-V open instruction set architecture. We
discuss our systematic empirical evaluation of HardScope which demonstrates
that it can mitigate all currently known DOP attacks, and has a real-world
performance overhead of 3.2% in embedded benchmarks
- …