5,721 research outputs found
Asynchronous techniques for system-on-chip design
SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed
Recommended from our members
Active timing margin management to improve microprocessor power efficiency
Improving power/performance efficiency is critical for today’s micro- processors. From edge devices to datacenters, lower power or higher performance always produces better systems, measured by lower cost of ownership or longer battery time. This thesis studies improving microprocessor power/performance efficiency by optimizing the pipeline timing margin. In particular, this thesis focuses on improving the efficacy of Active Timing Margin, a young technology that dynamically adjusts the margin.
Active timing margin trims down the pipeline timing margin with a control loop that adjusts voltage and frequency based on real-time chip environment monitoring. The key insight of this thesis is that in order to maximize active timing margin’s efficiency enhancement benefits, synergistic management from processor architecture design and system software scheduling are needed. To that end, this thesis covers the major consumers of pipeline timing margin, including temperature, voltage, and process variation. For temperature variation, the thesis proposes a table-lookup based active timing margin mechanism, and an associated temperature management scheme to minimize power consumption. For voltage variation, the thesis characterizes the limiting factors of adaptive clocking’s power saving and proposes application scheduling to maximize total system power reduction. For process variation, the thesis proposes core-level adaptive clocking reconfiguration to automatically expose inter-core variation and discusses workload scheduling and throttling management to control critical application performance.
The author believes the optimization presented in this thesis can potentially benefit a variety of processor architectures as the conclusions are based on the solid measurement on state-of-the-art processors, and the research objective, active timing margin, already has wide applicability in the latest microprocessors by the time this thesis is written.Electrical and Computer Engineerin
Delay Measurements and Self Characterisation on FPGAs
This thesis examines new timing measurement methods for self delay characterisation of Field-Programmable Gate Arrays (FPGAs) components and delay measurement of complex circuits
on FPGAs. Two novel measurement techniques based on analysis of a circuit's output failure
rate and transition probability is proposed for accurate, precise and efficient measurement of
propagation delays. The transition probability based method is especially attractive, since
it requires no modifications in the circuit-under-test and requires little hardware resources,
making it an ideal method for physical delay analysis of FPGA circuits.
The relentless advancements in process technology has led to smaller and denser transistors
in integrated circuits. While FPGA users benefit from this in terms of increased hardware
resources for more complex designs, the actual productivity with FPGA in terms of timing
performance (operating frequency, latency and throughput) has lagged behind the potential
improvements from the improved technology due to delay variability in FPGA components
and the inaccuracy of timing models used in FPGA timing analysis. The ability to measure
delay of any arbitrary circuit on FPGA offers many opportunities for on-chip characterisation
and physical timing analysis, allowing delay variability to be accurately tracked and variation-aware optimisations to be developed, reducing the productivity gap observed in today's FPGA
designs.
The measurement techniques are developed into complete self measurement and characterisation platforms in this thesis, demonstrating their practical uses in actual FPGA hardware for
cross-chip delay characterisation and accurate delay measurement of both complex combinatorial and sequential circuits, further reinforcing their positions in solving the delay variability
problem in FPGAs
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
Software Canaries: Software-based Path Delay Fault Testing for Variation-aware Energy-efficient Design
ABSTRACT Software-based path delay fault testing (SPDFT) has been used to identify faulty chips that cannot meet timing constraints due to gross delay defects. In this paper, we propose using SPDFT for a new purpose -aggressively selecting the operating point of a variation-affected design. In order to use SPDFT for this purpose, test routines must provide high coverage of potentially-critical paths and must have low dynamic performance overhead. We describe how to apply SPDFT for selecting an energy-efficient operating point for a variation-affected processor and demonstrate that our test routines achieve ample coverage and low overhead
Recommended from our members
Improving timing verification and delay testing methodologies for IC designs
textThe task of ensuring the correct temporal behavior of IC designs,
both before and after fabrication, is extremely important. It is becoming
even more imperative as the demand for performance increases and process
technology advances into the deep sub-micron region.
This dissertation tackles the key issues in the timing verification
and delay testing methodologies. An efficient methodology is presented to
identify false timing paths in the timing verification methodology which utilizes
ATPG technique and timing information from an ordered list of timing
paths according to the delay information. This dissertation also presents a
speed binning methodology which utilizes structural delay tests successfully
instead of functional tests. In addition, it establishes a methodology which
quantifies the correlation between the timing verification prediction and
actual silicon measurement of timing paths. This quantification methodology
lays the foundation for further research to study the impact of deep
submicron effects on design performanceElectrical and Computer Engineerin
Design of a Programmable Passive SoC for Biomedical Applications Using RFID ISO 15693/NFC5 Interface
Low power, low cost inductively powered passive biotelemetry system involving fully customized RFID/NFC interface base SoC has gained popularity in the last decades. However, most of the SoCs developed are application specific and lacks either on-chip computational or sensor readout capability. In this paper, we present design details of a programmable passive SoC in compliance with ISO 15693/NFC5 standard for biomedical applications. The integrated system consists of a 32-bit microcontroller, a sensor readout circuit, a 12-bit SAR type ADC, 16 kB RAM, 16 kB ROM and other digital peripherals. The design is implemented in a 0.18 μ m CMOS technology and used a die area of 1.52 mm × 3.24 mm. The simulated maximum power consumption of the analog block is 592 μ W. The number of external components required by the SoC is limited to an external memory device, sensors, antenna and some passive components. The external memory device contains the application specific firmware. Based on the application, the firmware can be modified accordingly. The SoC design is suitable for medical implants to measure physiological parameters like temperature, pressure or ECG. As an application example, the authors have proposed a bioimplant to measure arterial blood pressure for patients suffering from Peripheral Artery Disease (PAD)
Exploiting Application Behaviors for Resilient Static Random Access Memory Arrays in the Near-Threshold Computing Regime
Near-Threshold Computing embodies an intriguing choice for mobile processors due to the promise of superior energy efficiency, extending the battery life of these devices while reducing the peak power draw. However, process, voltage, and temperature variations cause a significantly high failure rate of Level One cache cells in the near-threshold regime a stark contrast to designs in the super-threshold regime, where fault sites are rare.
This thesis work shows that faulty cells in the near-threshold regime are highly clustered in certain regions of the cache. In addition, popular mobile benchmarks are studied to investigate the impact of run-time workloads on timing faults manifestation. A technique to mitigate the run-time faults is proposed. This scheme maps frequently used data to healthy cache regions by exploiting the application cache behaviors. The results show up to 78% gain in performance over two other state-of-the-art techniques
- …