625 research outputs found
A Comprehensive Analysis of Swarming-based Live Streaming to Leverage Client Heterogeneity
Due to missing IP multicast support on an Internet scale, over-the-top media
streams are delivered with the help of overlays as used by content delivery
networks and their peer-to-peer (P2P) extensions. In this context,
mesh/pull-based swarming plays an important role either as pure streaming
approach or in combination with tree/push mechanisms. However, the impact of
realistic client populations with heterogeneous resources is not yet fully
understood. In this technical report, we contribute to closing this gap by
mathematically analysing the most basic scheduling mechanisms latest deadline
first (LDF) and earliest deadline first (EDF) in a continuous time Markov chain
framework and combining them into a simple, yet powerful, mixed strategy to
leverage inherent differences in client resources. The main contributions are
twofold: (1) a mathematical framework for swarming on random graphs is proposed
with a focus on LDF and EDF strategies in heterogeneous scenarios; (2) a mixed
strategy, named SchedMix, is proposed that leverages peer heterogeneity. The
proposed strategy, SchedMix is shown to outperform the other two strategies
using different abstractions: a mean-field theoretic analysis of buffer
probabilities, simulations of a stochastic model on random graphs, and a
full-stack implementation of a P2P streaming system.Comment: Technical report and supplementary material to
http://ieeexplore.ieee.org/document/7497234
DLWUC: Distance and Load Weight Updated Clustering-Based Clock Distribution for SOC Architecture
High-clock skew variations and degradation of driving ability of buffers lead to an additional power dissipation in Clock Distribution Network (CDN) that increases the dimensionality of buffers and coordination among flip-flops. The manual threshold level to predict the Region of Interest (ROI) is not applicable in clustering process due to the complexities of excessive wire length and critical delay. This paper proposes the Distance and Load Weight Updated Clustering (DLWUC) to determine the suitable position of logical components. Initially, the DLWUC utilizes the Hybrid Weighted Distance (HWD) to estimate the distance and construct the distance matrix. The weight value extracted from the sorted distance matrix facilitates the projection of buffers. The updated weight value serves as the base for clustering with labeled outputs. The placement of buffer at the suitable place from load weight updated clustering provides the necessary trade-off between clock provision and load balance. The DLWUC discussed in this paper reduces the size of buffers, skew, power and latency compared to the existing topologies
Exploration and Design of Power-Efficient Networked Many-Core Systems
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level.
From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques.
From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented.
Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast
High-performance and Low-power Clock Network Synthesis in the Presence of Variation.
Semiconductor technology scaling requires continuous evolution of all aspects of physical
design of integrated circuits. Among the major design steps, clock-network synthesis
has been greatly affected by technology scaling, rendering existing methodologies inadequate.
Clock routing was previously sufficient for smaller ICs, but design difficulty and
structural complexity have greatly increased as interconnect delay and clock frequency increased
in the 1990s. Since a clock network directly influences IC performance and often
consumes a substantial portion of total power, both academia and industry developed synthesis
methodologies to achieve low skew, low power and robustness from PVT variations.
Nevertheless, clock network synthesis under tight constraints is currently the least automated
step in physical design and requires significant manual intervention, undermining
turn-around-time. The need for multi-objective optimization over a large parameter space
and the increasing impact of process variation make clock network synthesis particularly
challenging.
Our work identifies new objectives, constraints and concerns in the clock-network synthesis
for systems-on-chips and microprocessors. To address them, we generate novel
clock-network structures and propose changes in traditional physical-design flows. We
develop new modeling techniques and algorithms for clock power optimization subject
to tight skew constraints in the presence of process variations. In particular, we offer
SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below
5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while
tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we
propose new techniques and a methodology to reduce dynamic power consumption by
6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis
within global placement. We also present a novel non-tree topology that is 2.3x more
power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy
in a clock network to bridge the gap between tree-like and mesh-like topologies.
Integrated optimization techniques for high-quality clock networks described in this dissertation
strong empirical results in experiments with recent industry-released benchmarks
in the presence of process variation. Our software implementations were recognized with
the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests
organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd
{TRIX}: {L}ow-Skew Pulse Propagation for Fault-Tolerant Hardware
The vast majority of hardware architectures use a carefully timed reference signal to clock their computational logic. However, standard distribution solutions are not fault-tolerant. In this work, we present a simple grid structure as a more reliable clock propagation method and study it by means of simulation experiments. Fault-tolerance is achieved by forwarding clock pulses on arrival of the second of three incoming signals from the previous layer. A key question is how well neighboring grid nodes are synchronized, even without faults. Analyzing the clock skew under typical-case conditions is highly challenging. Because the forwarding mechanism involves taking the median, standard probabilistic tools fail, even when modeling link delays just by unbiased coin flips. Our statistical approach provides substantial evidence that this system performs surprisingly well. Specifically, in an "infinitely wide" grid of height~, the delay at a pre-selected node exhibits a standard deviation of ( link delay uncertainties for ) and skew between adjacent nodes of ( link delay uncertainties for ). We conclude that the proposed system is a very promising clock distribution method. This leads to the open problem of a stochastic explanation of the tight concentration of delays and skews. More generally, we believe that understanding our very simple abstraction of the system is of mathematical interest in its own right
TRIX: Low-Skew Pulse Propagation for Fault-Tolerant Hardware
The vast majority of hardware architectures use a carefully timed reference
signal to clock their computational logic. However, standard distribution
solutions are not fault-tolerant. In this work, we present a simple grid
structure as a more reliable clock propagation method and study it by means of
simulation experiments. Fault-tolerance is achieved by forwarding clock pulses
on arrival of the second of three incoming signals from the previous layer.
A key question is how well neighboring grid nodes are synchronized, even
without faults. Analyzing the clock skew under typical-case conditions is
highly challenging. Because the forwarding mechanism involves taking the
median, standard probabilistic tools fail, even when modeling link delays just
by unbiased coin flips.
Our statistical approach provides substantial evidence that this system
performs surprisingly well. Specifically, in an "infinitely wide" grid of
height~, the delay at a pre-selected node exhibits a standard deviation of
( link delay uncertainties for ) and skew
between adjacent nodes of ( link delay
uncertainties for ). We conclude that the proposed system is a very
promising clock distribution method. This leads to the open problem of a
stochastic explanation of the tight concentration of delays and skews. More
generally, we believe that understanding our very simple abstraction of the
system is of mathematical interest in its own right.Comment: 16 pages, 11 figure
STUDY OF SINGLE-EVENT EFFECTS ON DIGITAL SYSTEMS
Microelectronic devices and systems have been extensively utilized in a variety of radiation
environments, ranging from the low-earth orbit to the ground level. A high-energy particle from
such an environment may cause voltage/current transients, thereby inducing Single Event Effect
(SEE) errors in an Integrated Circuit (IC). Ever since the first SEE error was reported in 1975,
this community has made tremendous progress in investigating the mechanisms of SEE and
exploring radiation tolerant techniques. However, as the IC technology advances, the existing
hardening techniques have been rendered less effective because of the reduced spacing and
charge sharing between devices. The Semiconductor Industry Association (SIA) roadmap has
identified radiation-induced soft errors as the major threat to the reliable operation of electronic
systems in the future. In digital systems, hardening techniques of their core components, such as
latches, logic, and clock network, need to be addressed.
Two single event tolerant latch designs taking advantage of feedback transistors are
presented and evaluated in both single event resilience and overhead. These feedback transistors
are turned OFF in the hold mode, thereby yielding a very large resistance. This, in turn, results in
a larger feedback delay and higher single event tolerance. On the other hand, these extra
transistors are turned ON when the cell is in the write mode. As a result, no significant write
delay is introduced. Both designs demonstrate higher upset threshold and lower cross-section
when compared to the reference cells.
Dynamic logic circuits have intrinsic single event issues in each stage of the operations. The
worst case occurs when the output is evaluated logic high, where the pull-up networks are turned
OFF. In this case, the circuit fails to recover the output by pulling the output up to the supply rail.
A capacitor added to the feedback path increases the node capacitance of the output and the
feedback delay, thereby increasing the single event critical charge. Another differential structure
that has two differential inputs and outputs eliminates single event upset issues at the expense of
an increased number of transistors.
Clock networks in advanced technology nodes may cause significant errors in an IC as the
devices are more sensitive to single event strikes. Clock mesh is a widely used clocking scheme
in a digital system. It was fabricated in a 28nm technology and evaluated through the use of
heavy ions and laser irradiation experiments. Superior resistance to radiation strikes was
demonstrated during these tests.
In addition to mitigating single event issues by using hardened designs, built-in current
sensors can be used to detect single event induced currents in the n-well and, if implemented,
subsequently execute fault correction actions. These sensors were simulated and fabricated in a
28nm CMOS process. Simulation, as well as, experimental results, substantiates the validity of
this sensor design. This manifests itself as an alternative to existing hardening techniques.
In conclusion, this work investigates single event effects in digital systems, especially those
in deep-submicron or advanced technology nodes. New hardened latch, dynamic logic, clock,
and current sensor designs have been presented and evaluated. Through the use of these designs,
the single event tolerance of a digital system can be achieved at the expense of varying overhead
in terms of area, power, and delay
Implementation of a 4-bit Ripple Carry Full Adder of Mirror Design Style Using Synopsys Generic 90nm Technology on a Full-Custom and Semi-Custom Design
The most frequently used component in the datapath block and the speed-limiting element is the adder. Because of this, it is essential to optimize the adder knowing it has a big impact on the overall system performance. In addition to that, adders are a very important subsystem in digital designs, thus, taking care about its performance must be spotted. By manipulating the transistor sizes and circuit topology, the speed can be optimized. A circuit of a CMOS (Complementary metal oxide semiconductor) 4-bit RCA (Ripple Carry Adder) is presented. The proposed adder cell refers to the CMOS adder class executed on CMOS mirror design style that has a smaller area and delay compared with the static adder implementation of the full adder. By simply cascading full-adder blocks, one obtains a Ripple-Carry Adder which perhaps the simplest to implement than that of the other carry adders. Creating the full adder in schematic diagram is a part of Pre-simulation. It incorporates the construction of CMOS transistors and connected through the use of wires. Widths and lengths of the transistors are the crucial parts in designing to place and route connections easily. Layout diagram is the equivalent of the schematic diagram but more on a detailed part and it should be the same as the transistor based circuit. With the aid of the verification processes such as DRC (Design Rule Check) and LVS (Layout versus Schematic), it can give an assurance that both the schematic and layout diagrams are similar and functioning properly
- …