228 research outputs found
Doctor of Philosophy
dissertationOver the last decade, cyber-physical systems (CPSs) have seen significant applications in many safety-critical areas, such as autonomous automotive systems, automatic pilot avionics, wireless sensor networks, etc. A Cps uses networked embedded computers to monitor and control physical processes. The motivating example for this dissertation is the use of fault- tolerant routing protocol for a Network-on-Chip (NoC) architecture that connects electronic control units (Ecus) to regulate sensors and actuators in a vehicle. With a network allowing Ecus to communicate with each other, it is possible for them to share processing power to improve performance. In addition, networked Ecus enable flexible mapping to physical processes (e.g., sensors, actuators), which increases resilience to Ecu failures by reassigning physical processes to spare Ecus. For the on-chip routing protocol, the ability to tolerate network faults is important for hardware reconfiguration to maintain the normal operation of a system. Adding a fault-tolerance feature in a routing protocol, however, increases its design complexity, making it prone to many functional problems. Formal verification techniques are therefore needed to verify its correctness. This dissertation proposes a link-fault-tolerant, multiflit wormhole routing algorithm, and its formal modeling and verification using two different methodologies. An improvement upon the previously published fault-tolerant routing algorithm, a link-fault routing algorithm is proposed to relax the unrealistic node-fault assumptions of these algorithms, while avoiding deadlock conservatively by appropriately dropping network packets. This routing algorithm, together with its routing architecture, is then modeled in a process-algebra language LNT, and compositional verification techniques are used to verify its key functional properties. As a comparison, it is modeled using channel-level VHDL which is compiled to labeled Petri-nets (LPNs). Algorithms for a partial order reduction method on LPNs are given. An optimal result is obtained from heuristics that trace back on LPNs to find causally related enabled predecessor transitions. Key observations are made from the comparison between these two verification methodologies
Formal Analysis of a Fault-Tolerant Routing Algorithm for a Network-on-Chip
International audienceA fault-tolerant routing algorithm in Network-on-Chip architectures provides adaptivity for on-chip communications. Adding fault-tolerance adaptivity to a routing algorithm increases its design complexity and makes it prone to deadlock and other problems if improperly implemented. Formal verification techniques are needed to check the correctness of the design. This paper performs formal analysis on an extension of the link-fault tolerant Network-on-Chip architecture introduced by Wu et al. that supports multiflit wormhole routing. This paper describes several lessons learned during the process of constructing a formal model of this routing architecture. Finally, this paper presents how the deadlock freedom and tolerance to a single-link fault is verified for a two-by-two mesh version of this routing architecture
Composability and Predictability for Independent Application Development, Verification and Execution
System-on-chip (SOC) design gets increasingly complex, as a growing number of applications are integrated in modern systems. Some of these applications have real-time requirements, such as a minimum throughput or a maximum latency. To reduce cost, system resources are shared between applications, making their timing behavior inter-dependent. Real-time requirements must hence be verified for all possible combinations of concurrently executing applications, which is not feasible with commonly used simulation-based techniques. This chapter addresses this problem using two complexity-reducing concepts: composability and predictability. Applications in a composable system are completely isolated and cannot affect each other’s behaviors, enabling them to be independently verified. Predictable systems, on the other hand, provide lower bounds on performance, allowing applications to be verified using formal performance analysis. Five techniques to achieve composability and/or predictability in SOC resources are presented and we explain their implementation for processors, interconnect, and memories in our platform
Exploration and Design of Power-Efficient Networked Many-Core Systems
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level.
From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques.
From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented.
Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
Spatial parallelism in the routers of asynchronous on-chip networks
State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Spatial parallelism in the routers of asynchronous on-chip networks
State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Design of variation-tolerant synchronizers for multiple clock and voltage domains
PhD ThesisParametric variability increasingly affects the performance of electronic circuits as
the fabrication technology has reached the level of 32nm and beyond. These
parameters may include transistor Process parameters (such as threshold
voltage), supply Voltage and Temperature (PVT), all of which could have a
significant impact on the speed and power consumption of the circuit, particularly
if the variations exceed the design margins. As systems are designed with more
asynchronous protocols, there is a need for highly robust synchronizers and
arbiters. These components are often used as interfaces between communication
links of different timing domains as well as sampling devices for asynchronous
inputs coming from external components. These applications have created a need
for new robust designs of synchronizers and arbiters that can tolerate process,
voltage and temperature variations.
The aim of this study was to investigate how synchronizers and arbiters should be
designed to tolerate parametric variations. All investigations focused mainly on
circuit-level and transistor level designs and were modeled and simulated in the
UMC90nm CMOS technology process. Analog simulations were used to measure
timing parameters and power consumption along with a “Monte Carlo” statistical
analysis to account for process variations.
Two main components of synchronizers and arbiters were primarily investigated:
flip-flop and mutual-exclusion element (MUTEX). Both components can violate the
input timing conditions, setup and hold window times, which could cause
metastability inside their bistable elements and possibly end in failures. The
mean-time between failures is an important reliability feature of any synchronizer
delay through the synchronizer.
The MUTEX study focused on the classical circuit, in addition to a number of
tolerance, based on increasing internal gain by adding current sources, reducing
the capacitive loading, boosting the transconductance of the latch, compensating
the existing Miller capacitance, and adding asymmetry to maneuver the metastable
point. The results showed that some circuits had little or almost no improvements,
while five techniques showed significant improvements by reducing τ and
maintaining high tolerance.
Three design approaches are proposed to provide variation-tolerant
synchronizers. wagging synchronizer proposed to First, the is significantly
increase reliability over that of the conventional two flip-flop synchronizer. The
robustness of the wagging technique can be enhanced by using robust τ latches or
adding one more cycle of synchronization. The second approach is the
Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly
detecting a metastable event and correcting it by enforcing the previously stored
logic value. This technique significantly reduces the resolution time down from
uncertain
synchronization technique is proposed to transfer signals between Multiple-
Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional
level-shifters between the domains or multiple power supplies within each
domain. This interface circuit uses a synchronous set and feedback reset protocol
which provides level-shifting and synchronization of all signals between the
domains, from a wide range of voltage-supplies and clock frequencies.
Overall, synchronizer circuits can tolerate variations to a greater extent by
employing the wagging technique or using a MADAC latch, while MUTEX tolerance
can suffice with small circuit modifications. Communication between MVD/MCD
can be achieved by an asynchronous handshake
without a need for adding level-shifters.The Saudi Arabian Embassy in London,
Umm Al-Qura University, Saudi Arabi
- …