411 research outputs found
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Desynchronization: Synthesis of asynchronous circuits from synchronous specifications
Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur
Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack
Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria
Validation and verification of the interconnection of hardware intellectual property blocks for FPGA-based packet processing systems
As networks become more versatile, the computational requirement for supporting additional
functionality increases. The increasing demands of these networks can be met by Field Programmable
Gate Arrays (FPGA), which are an increasingly popular technology for implementing packet processing
systems. The fine-grained parallelism and density of these devices can be exploited to meet the
computational requirements and implement complex systems on a single chip. However, the increasing
complexity of FPGA-based systems makes them susceptible to errors and difficult to test and debug.
To tackle the complexity of modern designs, system-level languages have been developed to provide
abstractions suited to the domain of the target system. Unfortunately, the lack of formality in
these languages can give rise to errors that are not caught until late in the design cycle. This
thesis presents three techniques for verifying and validating FPGA-based packet processing systems
described in a system-level description language. First, a type system is applied to the system
description language to detect errors before implementation. Second, system-level transaction
monitoring is used to observe high-level events on-chip following implementation. Third, the
high-level information embodied in the system description language is exploited to allow the system
to be automatically instrumented for on-chip monitoring.
This thesis demonstrates that these techniques catch errors which are undetected by traditional
verification and validation tools. The locations of faults are specified and errors are caught
earlier in the design flow, which saves time by reducing synthesis iterations
Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm
An essential step in the automation of electronic design is the placement of the physical components on the target semiconductor die. The placement step presents the opportunity to reduce costs in terms of wire length and performance degradation; however it is compute intensive and is NP-complete in terms of obtaining an optimal solution. As designs have grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraints or sheer compute effort required. Heuristic algorithms allow for efficient but sub-optimal designs to be produced with a reduction in processing time. A widely used algorithm is Simulated Annealing (SA).
The goal of this work was to develop a model that would enable an analysis into the feasibility of developing a hardware accelerated placement system which uses SA at its core. The SA heuristic was analyzed for possible improvements in efficiency with focus given to targeting the system for hardware. A solution implementing parallel computing with specialized hardware configurations inside a field programmable gate array (FPGA) was investigated as having the possibility to improve the efficiency of the SA-based algorithm. All supporting subsystems were also described for a hardware accelerated model.
A large speedup was analytically shown from both accelerating the critical path of the SA algorithm as well as novel methods of improving SA\u27s efficiency. As data throughput requirements were not included in this work, the results presented may be optimistic for an overall system speedup. However, the results clearly show that future work is warranted in studying the concept of a hardware accelerated placement system
Simulating and prototyping software defined networking (SDN) using Mininet approach to optimise host communication in realistic programmable networking environment
In this project, two tests were performed. On the first test, Mininet-WiFi was used to simulate a
Software Defined Network to demonstrate Mininet-WiFi’ s ability to be used as the Software
Defined Network emulator which can also be integrated to the existing network using a Network
Virtualized Function (NVF). A typical organization’s computer network was simulated which
consisted of a website hosted on the LAMP (Linux, Apache, MySQL, PHP) virtual machine, and
an F5 application delivery controller (ADC) which provided load balancing of requests sent to the
web applications. A website page request was sent from the virtual stations inside Mininet-WiFi.
The request was received by the application delivery controller, which then used round robin
technique to send the request to one of the web servers on the LAMP virtual machine. The web
server then returned the requested website to the requesting virtual stations using the simulated
virtual network. The significance of these results is that it presents Mininet-WiFi as an emulator,
which can be integrated into a real programmable networking environment offering a portable,
cost effective and easily deployable testing network, which can be run on a single computer. These
results are also beneficial to modern network deployments as the live network devices can also
communicate with the testing environment for the data center, cloud and mobile provides.
On the second test, a Software Defined Network was created in Mininet using python script. An
external interface was added to enable communication with the network outside of Mininet. The
amazon web services elastic computing cloud was used to host an OpenDaylight controller. This
controller is used as a control plane device for the virtual switch within Mininet. In order to test
the network, a webserver hosted on the Emulated Virtual Environment – Next Generation (EVENG)
software is connected to Mininet. EVE-NG is the Emulated Virtual Environment for
networking. It provides tools to be able to model virtual devices and interconnect them with other
virtual or physical devices. The OpenDaylight controller was able to create the flows to facilitate
communication between the hosts in Mininet and the webserver in the real-life network.Electrical and Mining EngineeringM. Tech. (Electrical Engineering
Adiabatic Approach for Low-Power Passive Near Field Communication Systems
This thesis tackles the need of ultra-low power electronics in the power limited passive Near Field Communication (NFC) systems. One of the techniques that has proven the potential of delivering low power operation is the Adiabatic Logic Technique. However, the low power benefits of the adiabatic circuits come with the challenges due to the absence of single opinion on the most energy efficient adiabatic logic family which constitute appropriate trade-offs between computation time, area and complexity based on the circuit and the power-clocking schemes. Therefore, five energy efficient adiabatic logic families working in single-phase, 2-phase and 4-phase power-clocking schemes were chosen.
Since flip-flops are the basic building blocks of any sequential circuit and the existing flip-flops are MUX-based (having more transistors) design, therefore a novel single-phase, 2-phase and 4-phase reset based flip-flops were proposed. The performance of the multi-phase adiabatic families was evaluated and compared based on the design examples such as 2-bit ring counter, 3-bit Up-Down counter and 16-bit Cyclic Redundancy Check (CRC) circuit (benchmark circuit) based on ISO 14443-3A standard. Several trade-offs, design rules, and an appropriate range for the supply voltage scaling for multi-phase adiabatic logic are proposed.
Furthermore, based on the NFC standard (ISO 14443-3A), data is frequently encoded using Manchester coding technique before transmitting it to the reader. Therefore, if Manchester encoding can be implemented using adiabatic logic technique, energy benefits are expected. However, adiabatic implementation of Manchester encoding presents a challenge. Therefore, a novel method for implementing Manchester encoding using adiabatic logic is proposed overcoming the challenges arising due to the AC power-clock.
Other challenges that come with the dynamic nature of the adiabatic gates and the complexity of the 4-phase power-clocking scheme is in synchronizing the power-clock v
phases and the time spent in designing, validation and debugging of errors. This requires a specific modelling approach to describe the adiabatic logic behaviour at the higher level of abstraction. However, describing adiabatic logic behaviour using Hardware Description Languages (HDLs) is a challenging problem due to the requirement of modelling the AC power-clock and the dual-rail inputs and outputs. Therefore, a VHDL-based modelling approach for the 4-phase adiabatic logic technique is developed for functional simulation, precise timing analysis and as an improvement over the previously described approaches
Recommended from our members
High-Speed Wide-Field Time-Correlated Single-Photon Counting Fluorescence Lifetime Imaging Microscopy
Fluorescence microscopy is a powerful imaging technique used in the biological sciences to identify labeled components of a sample with specificity. This is usually accomplished through labeling with fluorescent dyes, isolating these dyes by their spectral signatures with optical filters, and recording the intensity of the fluorescent response. Although these techniques are widely used, fluorescence intensity images can be negatively affected by a variety of factors that impact the fluorescence intensity. Fluorescence lifetime imaging microscopy (FLIM) is an imaging technique that is relatively immune to intensity fluctuations and also provides the unique ability to directly monitor the microenvironment surrounding a fluorophore. Despite the benefits associated with FLIM, the applications to which it is applied are fairly limited due to long image acquisition times and high cost of traditional hardware. Recent advances in complementary metal-oxide-semiconductor (CMOS) single-photon avalanche diodes (SPADs) have enabled the design of low-cost imaging arrays that are capable of recording lifetime images with acquisition times greater than one order of magnitude faster than existing systems. However, these SPAD arrays have yet to realize the full potential of the technology due to limitations in their ability to handle the vast amount of data generated during the commonly used time-correlated single-photon counting (TCSPC) lifetime imaging technique. This thesis presents the design, implementation, characterization, and demonstration of a high speed FLIM imaging system. The components of this design include a CMOS imager chip in a standard 0.13 ÎĽm technology containing a custom CMOS SPAD, a 64-by-64 array of these SPADs, pixel control circuitry, independent time-to-digital converters (TDCs), a FLIM specific datapath, and high bandwidth output buffers. In addition to the CMOS imaging array, a complete system was designed and implemented using a printed circuit board (PCB) for capturing data from the imager, creating histograms for the photon arrival data using field-programmable gate arrays, and transferring the data to a computer using a cabled PCIe interface. Finally, software is used to communicate between the imaging system and a computer.The dark count rate of the SPAD was measured to be only 231 Hz at room temperature while maintaining a photon detection probability of up to 30\%. TDCs included on the array have a 62.5 ps resolution and a 64 ns range, which is suitable for measuring the lifetime of most biological fluorophores. Additionally, the on-chip datapath was designed to handle continuous data transfers at rates capable of supporting TCSPC-based lifetime imaging at 100 frames per second. The system level implementation also provides sufficient data throughput for transferring up to 750 frames per second from the imaging system to a computer. The lifetime imaging system was characterized using standard techniques for evaluating SPAD performance and an electrical delay signal for measuring the TDC performance. This thesis concludes with a demonstration of TCSPC-FLIM imaging at 100 frames per second -- the fastest 64-by-64 TCSPC FLIM that has been demonstrated. This system overcomes some of the limitations of existing FLIM systems and has the potential to enable new application domains in dynamic FLIM imaging
- …