609 research outputs found
Fault-Tolerant FPGA-Based Systems
This paper presents a new approach to on-line fault tolerance via reconfiguration for the systems mapped onto field programmable gate arrays (FPGAs). The fault detection, based on self-checking technique, is introduced at application level; therefore our approach can detect the faults of configurable logic blocks (CLBs) and routing interconnections in the FPGAs concurrently with the normal system work. A grid of tiles is projected on the FPGA structure and a certain number of spare CLBs is reserved inside every tile. The number of spare CLBs per tile, which will be used as a backup upon detecting any faulty CLB, is estimated in accordance with the probability of failure. After locating the faulty CLBs, the faulty tile will be reconfigured with avoiding the faulty CLBs. Our proposed approach uses a combination of hardware and software redundancy. We assume that a module external to the FPGA controls automatically the reconfiguration process in addition to the diagnosis process (DIRC); typically this is an embedded microprocessor having some storage for the various tile configurations. We have implemented our approach using Xilinx Virtex FPGA. The DIRC code is written in JBits software tools. In response to a component failure this approach capitalizes on the unique reconfiguration capabilities of FPGAs and replaces the affected tile with a functionally equivalent one that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components
Efficient Synthesis of Network Updates
Software-defined networking (SDN) is revolutionizing the networking industry,
but current SDN programming platforms do not provide automated mechanisms for
updating global configurations on the fly. Implementing updates by hand is
challenging for SDN programmers because networks are distributed systems with
hundreds or thousands of interacting nodes. Even if initial and final
configurations are correct, naively updating individual nodes can lead to
incorrect transient behaviors, including loops, black holes, and access control
violations. This paper presents an approach for automatically synthesizing
updates that are guaranteed to preserve specified properties. We formalize
network updates as a distributed programming problem and develop a synthesis
algorithm based on counterexample-guided search and incremental model checking.
We describe a prototype implementation, and present results from experiments on
real-world topologies and properties demonstrating that our tool scales to
updates involving over one-thousand nodes
Investigations into the feasibility of an on-line test methodology
This thesis aims to understand how information coding and the protocol that it
supports can affect the characteristics of electronic circuits. More specifically, it
investigates an on-line test methodology called IFIS (If it Fails It Stops) and its
impact on the design, implementation and subsequent characteristics of circuits
intended for application specific lC (ASIC) technology.
The first study investigates the influences of information coding and protocol on the
characteristics of IFIS systems. The second study investigates methods of circuit
design applicable to IFIS cells and identifies the· technique possessing the
characteristics most suitable for on-line testing. The third study investigates the
characteristics of a 'real-life' commercial UART re-engineered using the techniques
resulting from the previous two studies. The final study investigates the effects of the
halting properties endowed by the protocol on failure diagnosis within IFIS systems.
The outcome of this work is an identification and characterisation of the factors that
influence behaviour, implementation costs and the ability to test and diagnose IFIS
designs
Revisiting Actor Programming in C++
The actor model of computation has gained significant popularity over the
last decade. Its high level of abstraction makes it appealing for concurrent
applications in parallel and distributed systems. However, designing a
real-world actor framework that subsumes full scalability, strong reliability,
and high resource efficiency requires many conceptual and algorithmic additives
to the original model.
In this paper, we report on designing and building CAF, the "C++ Actor
Framework". CAF targets at providing a concurrent and distributed native
environment for scaling up to very large, high-performance applications, and
equally well down to small constrained systems. We present the key
specifications and design concepts---in particular a message-transparent
architecture, type-safe message interfaces, and pattern matching
facilities---that make native actors a viable approach for many robust,
elastic, and highly distributed developments. We demonstrate the feasibility of
CAF in three scenarios: first for elastic, upscaling environments, second for
including heterogeneous hardware like GPGPUs, and third for distributed runtime
systems. Extensive performance evaluations indicate ideal runtime behaviour for
up to 64 cores at very low memory footprint, or in the presence of GPUs. In
these tests, CAF continuously outperforms the competing actor environments
Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
LSI/VLSI design for testability analysis and general approach
The incorporation of testability characteristics into large scale digital design is not only necessary for, but also pertinent to effective device testing and enhancement of device reliability. There are at least three major DFT techniques, namely, the self checking, the LSSD, and the partitioning techniques, each of which can be incorporated into a logic design to achieve a specific set of testability and reliability requirements. Detailed analysis of the design theory, implementation, fault coverage, hardware requirements, application limitations, etc., of each of these techniques are also presented
Developing a distributed electronic health-record store for India
The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India
Response-Time Analysis of Limited-Preemptive Parallel DAG Tasks Under Global Scheduling
Most recurrent real-time applications can be modeled as a set of sequential code segments (or blocks) that must be (repeatedly) executed in a specific order. This paper provides a schedulability analysis for such systems modeled as a set of parallel DAG tasks executed under any limited-preemptive global job-level fixed priority scheduling policy. More precisely, we derive response-time bounds for a set of jobs subject to precedence constraints, release jitter, and execution-time uncertainty, which enables support for a wide variety of parallel, limited-preemptive execution models (e.g., periodic DAG tasks, transactional tasks, generalized multi-frame tasks, etc.). Our analysis explores the space of all possible schedules using a powerful new state abstraction and state-pruning technique. An empirical evaluation shows the analysis to identify between 10 to 90 percentage points more schedulable task sets than the state-of-the-art schedulability test for limited-preemptive sporadic DAG tasks. It scales to systems of up to 64 cores with 20 DAG tasks. Moreover, while our analysis is almost as accurate as the state-of-the-art exact schedulability test based on model checking (for sequential non-preemptive tasks), it is three orders of magnitude faster and hence capable of analyzing task sets with more than 60 tasks on 8 cores in a few seconds
CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS
This thesis provides a whole set of design methods to enable and manage the
runtime heterogeneity of features-rich industry-ready Tile-Based Networkon-
Chips at different abstraction layers (Architecture Design, Network Assembling,
Testing of NoC, Runtime Operation). The key idea is to maintain
the functionalities of the original layers, and to improve the performance
of architectures by allowing, joint optimization and layer coordinations. In
general purpose systems, we address the microarchitectural challenges by codesigning
and co-optimizing feature-rich architectures. In application-specific
NoCs, we emphasize the event notification, so that the platform is continuously
under control. At the network assembly level, this thesis proposes a
Hold Time Robustness technique, to tackle the hold time issue in synchronous
NoCs. At the network architectural level, the choice of a suitable synchronization
paradigm requires a boost of synthesis flow as well as the coexistence
with the DVFS. On one hand this implies the coexistence of mesochronous
synchronizers in the network with dual-clock FIFOs at network boundaries.
On the other hand, dual-clock FIFOs may be placed across inter-switch links
hence removing the need for mesochronous synchronizers. This thesis will
study the implications of the above approaches both on the design flow and
on the performance and power quality metrics of the network. Once the manycore
system is composed together, the issue of testing it arises. This thesis
takes on this challenge and engineers various testing infrastructures. At the
upper abstraction layer, the thesis addresses the issue of managing the fully
operational system and proposes a congestion management technique named
HACS. Moreover, some of the ideas of this thesis will undergo an FPGA
prototyping. Finally, we provide some features for emerging technology by
characterizing the power consumption of Optical NoC Interfaces
- …