74 research outputs found
High Peformance and Low Power On-Die Interconnect Fabrics.
Increasing power density with technology scaling has caused stagnation in operating frequency of modern day microprocessors. This has led designers to prefer multicore architectures over complex monolithic processors to keep up with the demand for rising computing throughput. Although processing units are getting smaller and simpler, the dramatic rise of their count on a single die has made the fabric that connects these processing units increasingly complex. These interconnect fabrics have become a bottleneck in improving overall system effciency. As a result, the design paradigm for multi-core chips is gradually shifting from a core-centric architecture towards an interconnect-centric architecture, where system efficiency is limited by the fabric rather than the processing ability of any individual core. This dissertation introduces three novel and synergistic circuit techniques to improve scalability of switch fabrics to make on-die integration of hundreds to thousands of cores feasible. 1) A matrix topology is proposed for designing a fully connected switch fabric that re-uses output buses for programming, and stores shue congurations at cross points. This significantly reduces routing congestion, lowers area/power, and improves per-
formance. Silicon measurements demonstrate 47% energy savings in a 64-lane SIMD processor fabricated in 65nm CMOS over a conventional implementation. 2) A novel approach to handle high radix arbitration along with data routing is proposed. It optimally uses existing cross-bar interconnect resources without requiring any additional overhead. Bandwidth exceeding 2Tb/s is recorded in a test prototype fabricated in 65nm. 3) Building on the later, a new circuit topology to manage and update priority adaptively within the switch fabric without incurring additional delay or area is then proposed. Several assist circuit techniques, such as a thyristor based sense amplifier and self regenerating bi-directional repeaters are proposed for high speed energy efficient signaling to and from the switch fabric to improve overall routing efficiency.
Using these techniques a 64 x 64 switch fabric with 128b data bus fabricated in 45nm achieves a throughput of 4.5Tb/s at single cycle latency while operating at 559MHz.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91506/1/sudhirks_1.pd
A configurable vector processor for accelerating speech coding algorithms
The growing demand for voice-over-packer (VoIP) services and multimedia-rich
applications has made increasingly important the efficient, real-time implementation of
low-bit rates speech coders on embedded VLSI platforms. Such speech coders are
designed to substantially reduce the bandwidth requirements thus enabling dense multichannel
gateways in small form factor. This however comes at a high computational cost
which mandates the use of very high performance embedded processors.
This thesis investigates the potential acceleration of two major ITU-T speech coding
algorithms, namely G.729A and G.723.1, through their efficient implementation on a
configurable extensible vector embedded CPU architecture. New scalar and vector ISAs
were introduced which resulted in up to 80% reduction in the dynamic instruction count
of both workloads. These instructions were subsequently encapsulated into a parametric,
hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research
and implementation of the vector datapath of this vector coprocessor which is tightly-coupled
to a Sparc-V8 compliant CPU, the optimization and simulation methodologies
employed and the use of Electronic System Level (ESL) techniques to rapidly design
SIMD datapaths
A CCD based curvature wavefront sensor for Adaptive Optics in Astronomy
Adaptive Optik (AO) ist eine Technik, um Störungen in der Abbildung von Objekten im Teleskop auszugleichen. Diese Störungen werden von Fluktuationen des Brechungsindexes in der Erdatmosphäre hervorgerufen. Zum Messen dieser Störungen gibt es eine Reihe verschiedener Wellenfrontsensoren. Einer davon ist der 'Curvature'- Wellenfrontsensor, was soviel wie 'Krümmungs'- Wellenfrontsensor bedeutet. An der Europäischen Südsternwarte (ESO) in Garching werden für das Very Large Telescope (VLT) und das VLT Interferometer (VLTI) eine Reihe von adaptiven Optiksystemen entwickelt, die den Curvature - Wellenfrontsensor verwenden. Bisher wurden in Curvature AO-Systemen Avalanche Photodioden (APDs) als Detektoren verwendet, da sehr kurze Belichtungszeiten (200 Mikrosekunden) und ein sehr niedriges Ausleserauschen des Detektors nötig sind. Aufgrund von Fortschritten in der Herstellungstechnologie von Charge Coupled Devices (CCDs) entwickelten wir ein spezielles CCD und untersuchten dessen Leistungsfähigkeit für ein 60 Element Curvature AO System, das mit sehr schwachen Lichtsignalen arbeitet. Hier wurde erstmals ein CCD als Wellenfrontsensor verwendet. Diese Dissertation zeigt, daß ein CCD annäherend die gleiche Leistungsfähigkeit wie APDs bietet, jedoch zu einem Bruchteil der Kosten und geringerer Komplexität. Weiter hat das CCD eine höhere Quantenausbeute und einen größeren dynamischen Bereich als APDs. Ein Ausleserauschen von weniger als 1,5 Elektronen bei 4000 Bildern pro Sekunde wurde erreicht. Für AO Systeme, die Wellenfrontkorrekturen mit vielen Elementen durchführen, können dünne, von hinten beleuchtete CCDs, APDs als bessere Detektoren ersetzen. Diese Dissertation präsentiert das Konzept, das Design und die Bestimmung der Leistungsfähigkeit dieses CCDs. Erste (von vorne beleuchtete) CCDs wurden erfolgreich getestet und die Leistungsfähigkeit in einem Laborexperiment nachgewiesen
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
Fine-grained Energy and Thermal Management using Real-time Power Sensors
With extensive use of battery powered devices such as smartphones, laptops an
Social Insect-Inspired Adaptive Hardware
Modern VLSI transistor densities allow large systems to be implemented within a single chip. As technologies get smaller, fundamental limits of silicon devices are reached resulting in lower design yields and post-deployment failures. Many-core systems provide a platform for leveraging the computing resource on offer by deep sub-micron technologies and also offer high-level capabilities for mitigating the issues with small feature sizes. However, designing for many-core systems that can adapt to in-field failures and operation variability requires an extremely large multi-objective optimisation space. When a many-core reaches the size supported by the densities of modern technologies (thousands of processing cores), finding design solutions in this problem space becomes extremely difficult.
Many biological systems show properties that are adaptive and scalable. This thesis proposes a self-optimising and adaptive, yet scalable, design approach for many-core based on the emergent behaviours of social-insect colonies. In these colonies there are many thousands of individuals with low intelligence who contribute, without any centralised control, to complete a wide range of tasks to build and maintain the colony. The experiments presented translate biological models of social-insect intelligence into simple embedded intelligence circuits. These circuits sense low-level system events and use this manage the parameters of the many-core's Network-on-Chip (NoC) during runtime.
Centurion, a 128-node many-core, was created to investigate these models at large scale in hardware. The results show that, by monitoring a small number of signals within each NoC router, task allocation emerges from the social-insect intelligence models that can self-configure to support representative applications. It is demonstrated that emergent task allocation supports fault tolerance with no extra hardware overhead. The response-threshold decision making circuitry uses a negligible amount of hardware resources relative to the size of the many-core and is an ideal technology for implementing embedded intelligence for system runtime management of large-complexity single-chip systems
- …