3 research outputs found

    Power efficiency of switch architecture extensions for fault tolerant NoC design

    No full text
    The increasingly parallel landscape of embedded computing platforms is bringing the reliability concern for the on-chip interconnection network (NoC) to the forefront. While very few works in the open literature bring their error recovery mechanisms down to microarchitectural and physical implementation, this paper documents the effort of optimizing a baseline NoC switch architecture for different fault-tolerant strategies against single-event upsets. As key contributions achieved, we not only come up with a new efficient fault-tolerant flow control protocol, but also we contrast correction vs. retransmission oriented switch microarchitectures, each implementing both data and control path protection, with physical implementation awareness. The accuracy of the analysis methodology enables us to report counterintuitive power-reliability trade-offs between the design points, serving as guidelines for implementing fault-tolerant communication in a power-constrained environment

    Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems

    Get PDF
    NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the design principles to efficiently implement interconnection networks in the resource-constrained on-chip setting have stabilized. On the other hand, the requirements on embedded system design are far from stabilizing. Embedded systems are composed by assembling together heterogeneous components featuring differentiated operating speeds and ad-hoc counter measures must be adopted to bridge frequency domains. Moreover, an unmistakable trend toward enhanced reconfigurability is clearly underway due to the increasing complexity of applications. At the same time, the technology effect is manyfold since it provides unprecedented levels of system integration but it also brings new severe constraints to the forefront: power budget restrictions, overheating concerns, circuit delay and power variability, permanent fault, increased probability of transient faults. Supporting different degrees of reconfigurability and flexibility in the parallel hardware platform cannot be however achieved with the incremental evolution of current design techniques, but requires a disruptive approach and a major increase in complexity. In addition, new reliability challenges cannot be solved by using traditional fault tolerance techniques alone but the reliability approach must be also part of the overall reconfiguration methodology. In this thesis we take on the challenge of engineering a NoC architectures for the next generation systems and we provide design methods able to overcome the conventional way of implementing multi-synchronous, reliable and reconfigurable NoC. Our analysis is not only limited to research novel approaches to the specific challenges of the NoC architecture but we also co-design the solutions in a single integrated framework. Interdependencies between different NoC features are detected ahead of time and we finally avoid the engineering of highly optimized solutions to specific problems that however coexist inefficiently together in the final NoC architecture. To conclude, a silicon implementation by means of a testchip tape-out and a prototype on a FPGA board validate the feasibility and effectivenes

    Cross-layer fault tolerance in networks-on-chip

    Get PDF
    The design of Networks-on-Chip follows the Open Systems Interconnection (OSI) reference model. The OSI model defines strictly separated network abstraction layers and specifies their functionality. Each layer has layer-specific information about the network that can be exclusively accessed by the methods of the layer. Adhering to the strict layer boundaries, however, leads to methods of the individual layers working in isolation from each other. This lack of interaction between methods is disadvantageous for fault diagnosis and fault tolerance in Networks-on-Chip as it results in solutions that have a high effort in terms of the time and implementation costs required to deal with faults. For Networks-on-Chip cross-layer design is considered as a promising method to remedy these shortcomings. It removes the strict layer boundaries by the exchange of information between layers. This interaction enables methods of different layers to cooperate, and thus, deal with faults more efficiently. Furthermore, providing lower layer information to the software allows hardware methods to be implemented as software tasks resulting in a reduction of the hardware complexity. The goal of this dissertation is the investigation of cross-layer design for fault diagnosis and fault tolerance in Networks-on-Chip. For fault diagnosis a scheme is proposed that allows the interaction of protocol-based diagnosis of the transport layer with functional diagnosis of the network layer and structural diagnosis of the physical layer by exchanging diagnostic information. The techniques use this information for optimizing their own diagnosis process. For protocol-based diagnosis on the transport layer, a diagnosis protocol is proposed that is able to locate faulty links, switches, and crossbar connections. For this purpose, the technique utilizes available information of lower layers. As proof of concept for the proposed interaction scheme, the diagnosis protocol is combined with a functional and a structural diagnosis approach and the performance and diagnosis quality of the resulting combinations is investigated. The results show that the combinations of the diagnosis protocol with one of the lower layer techniques have a considerably reduced fault localization latency compared to the functional and the structural standalone techniques. This reduction, however, comes at the expense of a reduced diagnosis quality. In terms of fault tolerance, the focus of this dissertation is on the design and implementation of cross-layer approaches utilizing software methods to provide fault tolerance for network layer routings. Two approaches for different routings are presented. The requirements to provide information of lower layers to the software using the available Network-on-Chip resources and interfaces for data communication are discussed. The concepts of two mechanisms of the data link layer are presented for converting status information into communicable units and for preventing communication resources from being blocked. In the first approach, software-based packet rerouting is proposed. By incorporating information from different layers, this approach provides fault tolerance for deterministic network layer routings. As specialization of software-based rerouting, dimension-order XY rerouting is presented. In the second approach, a reconfigurable routing for Networks-on-Chip with logical hierarchy is proposed in which cross-layer interaction is used to enable hierarchical units to manage themselves autonomously and to reconfigure the routing. Both approaches are evaluated regarding their performance as well as their implementation costs. In a final study, the cross-layer diagnosis technique and cross-layer fault tolerance approaches are combined. The information obtained by the diagnosis technique is used by the fault tolerance approaches for packet rerouting or for routing reconfiguration. The combinations are evaluated regarding their impact on Networks-on-Chip performance. The results show that the crosslayer information exchange with software has a considerable impact on performance when the amount of information becomes too large. In case of crosslayer diagnosis, however, the impact on Networks-on-Chip performance is significantly lower compared to functional and structural diagnosis
    corecore