12 research outputs found
The DeSyRe runtime support for fault-tolerant embedded MPSoCs
Semiconductor technology scaling makes chips moresensitive to faults. This paper describes the DeSyRe designapproach and its runtime management for future reliable embedded Multiprocessor Systems-on-Chip (MPSoCs). A light weight runtime system is described for shared-memory MPSoCs to support fault-tolerant execution upon detection of transient and permanent faults. The DeSyRe runtime system offers re-execution of tasks that suffer from transient faults and task-migration in cases where a worker processor is permanently faulty. In addition, a faulty worker can potentially remainusable, increasing systems fault-tolerance. This is achieved using alternative task implementations, which avoid the faulty circuit and are indicated in the application-code via pragma annotations, as well as by repairing a faulty core via hardware reconfiguration. Thereby, the system can be dynamically adapted using one ormultiple of the above mechanisms to mitigate faults. The DeSyReruntime system is evaluated using micro-benchmarks running ona Virtex-6 FPGA MPSoC. Results suggest that our enhance dfault-tolerant runtime system can successfully and efficiently execute all application tasks under a variety of fault cases
Design trade-offs in energy efficient NoC architectures
This paper studies design trade-offs in energy efficient Networks-on-Chip by evaluating every network architecture that derives when we apply all possible variations of design-configuration parameters on a baseline 2D mesh. Network separation (P), concentration (C), express channels (X), flit widths (W), and virtual channels (V). Our comperative analysis selects the network architecture configuration that gives the best energy delay product (EDP) while allowing a maximum area margin of 15% over the most energy efficient configuration of the baseline
Scalable Multigigabit Pattern Matching for Packet Inspection
In this paper, we consider hardware-based scanning and analyzing packets payload in order to detect hazardous contents.We present two pattern matching techniques to compare incoming packets against intrusion detection search patterns. The first approach, decoded partial CAM (DpCAM), predecodes incoming characters, aligns the decoded data, and performs logical AND on them to produce the match signal for each pattern. The second approach, perfect hashing memory (PHmem), uses perfect hashing to determine a unique memory location that contains the search pattern and a comparison between incoming data and memory output to determine the match. Both techniques are well suited for reconfigurable logic and match about 2200 intrusion detection patterns using a single Virtex2 field-programmable gate-array device. We show that DpCAM achieves a throughput between 2 and 8 Gb/s requiring 0.58–2.57 logic cells per search character. On the other hand, PHmem designs can support 2–5.7 Gb/s using a few tens of block RAMs (630–1404 kb) and only 0.28–0.65 logic cells per character. We evaluate both approaches in terms of performance and area cost and analyze their efficiency, scalability, and tradeoffs. Finally, we show that our designs achieve at least 30% higher efficiency compared to previous work, measured in throughput per area required per search characterMicroelectronics & Computer EngineeringElectrical Engineering, Mathematics and Computer Scienc
A software-defined architecture and prototype for disaggregated memory rack scale systems
Disaggregation and rack-scale systems have the potential of drastically increasing TCO and utilization of cloud datacenters, while maintaining performance. In this paper, we present a novel rack-scale system architecture featuring software-defined remote memory disaggregation. Our hardware design and operating system extensions enable unmodified applications to dynamically attach to memory segments residing on physically remote memory pools and use such remote segments in a byte-addressable manner, as if they were local to the application. Our system features also a control plane that automates software-defined dynamic matching of compute to memory resources, as driven by datacenter workload needs. We prototyped our system on the commercially available Zynq Ultrascale+ MPSoC platform. To our knowledge, this is the first time a software-defined disaggregated system has been prototyped on commercial hardware and evaluated through industry standard software benchmarks. Our initial results-using benchmarks that are artificially highly adversarial in terms of memory bandwidth-show that disaggregated memory access exhibits a round-trip latency of only 134 clock cycles; and a throughput penalty of as low as 55%, relative to locally-attached memory. We also discuss estimations as to how our findings may translate to applications with pragmatically milder memory aggressiveness levels, as well as innovation avenues across the stack opened up by our work. © 2017 IEEE
Software Passports for Automated Performance Anomaly Detection of Cyber-Physical Systems
Software performance anomaly detection is a major challenge in complex industrial cyber-physical systems. The automated comparison of runtime execution metrics to reference ones provides a potential solution. We introduce the concept of software passports, intended to act as a signature construct for runtime performance behaviour of reference executions. Our software passport design is based on Extra-Functional Behaviour (EFB) metrics. Amongst such metrics, our focus has been especially on CPU time, read and write communication event counts of different processes. The notion of phases for systems with repetitive tasks during their execution and its fundamental role in our software passports has also been elaborated. We employ regression modelling of our collected data for comparative purposes. The comparison reveals inconsistencies between the execution at hand and the software passport, if present. Such inconsistencies are strong indicators for presence of performance anomalies. Our design is capable of detecting synthetically introduced performance anomalies to the real execution tracing data from a semiconductor photolithography machine
DRedDbox: Demonstrating disaggregated memory in an optical data centre
This paper showcases the first experimental demonstration of disaggregated memory using the dRedDbox optical Data Centre architecture. Experimental results demonstrate the 4-tier network scalability and performance of the system at the physical and application layer. © 2018 OSA
dRedDbox: Demonstrating disaggregated memory in an optical Data Centre
This paper showcases the first experimental demonstration of disaggregated memory using the dRedDbox optical Data Centre architecture. Experimental results demonstrate the 4-tier network scalability and performance of the system at the physical and application layer. © OSA 2018
FPGA-based design using the FASTER toolchain: the case of STM Spear development board
Even though FPGAs are becoming more and more popular as they are used in many different scenarios like communications and HPC, the steep learning curve needed to work with this technology is still the major limiting factor to their full success. Many works proposed to mitigate this problem by creating a companion of tools to support the designer during the development phase for this technology.
The EU FASTER Project aims at realizing an integrated toolchain that assists the designer in the steps of the design flow that are necessary to port a given application onto an FPGA device. The novelty of the framework relies in the fact that the partial dynamic reconfiguration, which FPGA devices can exploit, is seen as a first class citizen throughout the whole design flow. This work reports a case study in which the FASTER toolchain has been used to port a raytracer application onto the STM Spear prototyping embedded platform. The paper discusses the steps done for the realization of the prototype and the results obtained on the target device. It finally reports some improvements that can be exploited to improve the performance of the hardware implementation that has been realized
Effective reconfigurable design: The FASTER approach
While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies. © 2014 Springer International Publishing Switzerland