17 research outputs found

    Formal Executable Models for Automatic Detection of Timing Anomalies

    Get PDF
    A timing anomaly is a counterintuitive timing behavior in the sense that a local fast execution slows down an overall global execution. The presence of such behaviors is inconvenient for the WCET analysis which requires, via abstractions, a certain monotony property to compute safe bounds. In this paper we explore how to systematically execute a previously proposed formal definition of timing anomalies. We ground our work on formal designs of architecture models upon which we employ guided model checking techniques. Our goal is towards the automatic detection of timing anomalies in given computer architecture designs

    Study of the effects of SEU-induced faults on a pipeline protected microprocessor

    Get PDF
    This paper presents a detailed analysis of the behavior of a novel fault-tolerant 32-bit embedded CPU as compared to a default (non-fault-tolerant) implementation of the same processor during a fault injection campaign of single and double faults. The fault-tolerant processor tested is characterized by per-cycle voting of microarchitectural and the flop-based architectural states, redundancy at the pipeline level, and a distributed voting scheme. Its fault-tolerant behavior is characterized for three different workloads from the automotive application domain. The study proposes statistical methods for both the single and dual fault injection campaigns and demonstrates the fault-tolerant capability of both processors in terms of fault latencies, the probability of fault manifestation, and the behavior of latent faults

    Energy analysis and optimisation techniques for automatically synthesised coprocessors

    Get PDF
    The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors. Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings. The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms

    Timing model derivation : pipeline analyzer generation from hardware description languages

    Get PDF
    Safety-critical systems are forced to finish their execution within strict deadlines so that worst-case execution time (WCET) guarantees are a crucial part of their verification. Timing models of the analyzed hardware form the basis for static analysis-based approaches like the aiT WCET analyzer. Currently, timing models are hand-crafted based on frequently incorrect documentation causing the process to be error-prone and time-consuming. This thesis bridges the gap between automatic hardware synthesis and WCET analysis development by introducing a process for the derivation of timing models from VHDL specifications. We propose a set of transformations and abstractions to reduce the hardware design\u27s complexity enabling the generation of efficient and provably correct WCET analyzers. They employ an abstract interpretation-based simulation of program executions based on a defined abstract simulation semantics. We have defined workflow patterns showing how to gradually apply the derivation process to VHDL models, thereby removing timing-irrelevant constructs. Interval property checking is used to validate the transformations. A further contribution of this thesis is the implementation of a tool set that realizes the introduced derivation process and shows its applicability to non-trivial industrial designs in experimental evaluations. Influences on design choices to the quality of the derived timing model are presented building an informal predictability notion for VHDL.Sicherheits-kritische Systeme unterliegen oft der Einhaltung strikter Laufzeitschranken, weshalb zur Verifikation sichere Obergrenzen der Laufzeit im schlimmsten Fall (WCET) bestimmt werden. Zeitmodelle der analysierten Hardware sind hierbei die Grundlage für auf statischen Analysen basierende Verfahren. Aktuell werden solche Modelle händisch aus Handbüchern extrahiert, ein sehr zeitaufwändiger und fehleranfälliger Prozess. Diese Arbeit schlägt eine Brücke zwischen automatischer Hardware-Synthese und der Entwicklung von WCET-Analysen durch die Einführung eines Ableitungsprozesses von Zeitmodellen aus VHDL-Spezifikationen. Transformationen und Abstraktionen werden zur Komplexitätsreduktion eingesetzt, um die Erzeugung von effizienten und beweisbar korrekten Analysatoren zu ermöglichen. Selbige bedienen sich abstrakter Interpretation von Programmausführungen basierend auf einer Simulations-Semantik. Definierte Arbeitsabläufe zeigen, wie man die Ableitung schrittweise auf VHDL-Modellen umsetzt und dadurch für das Zeitverhalten irrelevante Teile des Modells entfernt. Interval Property Checking gewährleistet hierbei, dass die Transformationen semantik-erhaltend sind. Eine Tool-Implementierung realisiert den vorgestellen Ableitungsprozess und unterstreicht seine Anwendbarkeit auf komplexe industrielle Designs durch experimentelle untersuchungen. Außerdem werden VHDL-Designentscheidungen hinsicht ihres Einflusses auf die Qualität des abgeleiteten Zeitmodells betrachtet

    Timing model derivation : static analysis of hardware description languages

    Get PDF
    Safety-critical hard real-time systems are subject to strict timing constraints. In order to derive guarantees on the timing behavior, the worst-case execution time (WCET) of each task comprising the system has to be known. The aiT tool has been developed for computing safe upper bounds on the WCET of a task. Its computation is mainly based on abstract interpretation of timing models of the processor and its periphery. These models are currently hand-crafted by human experts, which is a time-consuming and error-prone process. Modern processors are automatically synthesized from formal hardware specifications. Besides the processor’s functional behavior, also timing aspects are included in these descriptions. A methodology to derive sound timing models using hardware specifications is described within this thesis. To ease the process of timing model derivation, the methodology is embedded into a sound framework. A key part of this framework are static analyses on hardware specifications. This thesis presents an analysis framework that is build on the theory of abstract interpretation allowing use of classical program analyses on hardware description languages. Its suitability to automate parts of the derivation methodology is shown by different analyses. Practical experiments demonstrate the applicability of the approach to derive timing models. Also the soundness of the analyses and the analyses’ results is proved.Sicherheitskritische Echtzeitsysteme unterliegen strikten Zeitanforderungen. Um ihr Zeitverhalten zu garantieren müssen die Ausführungszeiten der einzelnen Programme, die das System bilden, bekannt sein. Um sichere obere Schranken für die Ausführungszeit von Programmen zu berechnen wurde aiT entwickelt. Die Berechnung basiert auf abstrakter Interpretation von Zeitmodellen des Prozessors und seiner Peripherie. Diese Modelle werden händisch in einem zeitaufwendigen und fehleranfälligen Prozess von Experten entwickelt. Moderne Prozessoren werden automatisch aus formalen Spezifikationen erzeugt. Neben dem funktionalen Verhalten beschreiben diese auch das Zeitverhalten des Prozessors. In dieser Arbeit wird eine Methodik zur sicheren Ableitung von Zeitmodellen aus der Hardwarespezifikation beschrieben. Um den Ableitungsprozess zu vereinfachen ist diese Methodik in eine automatisierte Umgebung eingebettet. Ein Hauptbestandteil dieses Systems sind statische Analysen auf Hardwarebeschreibungen. Diese Arbeit stellt eine Analyse-Umgebung vor, die auf der Theorie der abstrakten Interpretation aufbaut und den Einsatz von klassischen Programmanalysen auf Hardwarebeschreibungssprachen erlaubt. Die Eignung des Systems, Teile der Ableitungsmethodik zu automatisieren, wird anhand einiger Analysen gezeigt. Experimentelle Ergebnisse zeigen die Anwendbarkeit der Methodik zur Ableitung von Zeitmodellen. Die Korrektheit der Analysen und der Analyse-Ergebnisse wird ebenfalls bewiesen

    Parallelism and the software-hardware interface in embedded systems

    Get PDF
    This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    New techniques for functional testing of microprocessor based systems

    Get PDF
    Electronic devices may be affected by failures, for example due to physical defects. These defects may be introduced during the manufacturing process, as well as during the normal operating life of the device due to aging. How to detect all these defects is not a trivial task, especially in complex systems such as processor cores. Nevertheless, safety-critical applications do not tolerate failures, this is the reason why testing such devices is needed so to guarantee a correct behavior at any time. Moreover, testing is a key parameter for assessing the quality of a manufactured product. Consolidated testing techniques are based on special Design for Testability (DfT) features added in the original design to facilitate test effectiveness. Design, integration, and usage of the available DfT for testing purposes are fully supported by commercial EDA tools, hence approaches based on DfT are the standard solutions adopted by silicon vendors for testing their devices. Tests exploiting the available DfT such as scan-chains manipulate the internal state of the system, differently to the normal functional mode, passing through unreachable configurations. Alternative solutions that do not violate such functional mode are defined as functional tests. In microprocessor based systems, functional testing techniques include software-based self-test (SBST), i.e., a piece of software (referred to as test program) which is uploaded in the system available memory and executed, with the purpose of exciting a specific part of the system and observing the effects of possible defects affecting it. SBST has been widely-studies by the research community for years, but its adoption by the industry is quite recent. My research activities have been mainly focused on the industrial perspective of SBST. The problem of providing an effective development flow and guidelines for integrating SBST in the available operating systems have been tackled and results have been provided on microprocessor based systems for the automotive domain. Remarkably, new algorithms have been also introduced with respect to state-of-the-art approaches, which can be systematically implemented to enrich SBST suites of test programs for modern microprocessor based systems. The proposed development flow and algorithms are being currently employed in real electronic control units for automotive products. Moreover, a special hardware infrastructure purposely embedded in modern devices for interconnecting the numerous on-board instruments has been interest of my research as well. This solution is known as reconfigurable scan networks (RSNs) and its practical adoption is growing fast as new standards have been created. Test and diagnosis methodologies have been proposed targeting specific RSN features, aimed at checking whether the reconfigurability of such networks has not been corrupted by defects and, in this case, at identifying the defective elements of the network. The contribution of my work in this field has also been included in the first suite of public-domain benchmark networks

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Branch Prediction For Network Processors

    Get PDF
    Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine. Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures

    RISPP: A Run-time Adaptive Reconfigurable Embedded Processor

    Get PDF
    This Ph.D. thesis describes a new approach for adaptive processors using a reconfigurable fabric (embedded FPGA) to implement application-specific accelerators. A novel modular Special Instruction composition is presented along with a run-time system that exploits the provided adaptivity. The approach was simulated and prototyped using and FPGA. Comparisons with state-of-the-art appl.-specific and reconf. processors demonstrate significant improvements according the performance and efficiency
    corecore