74 research outputs found

    High Peformance and Low Power On-Die Interconnect Fabrics.

    Full text link
    Increasing power density with technology scaling has caused stagnation in operating frequency of modern day microprocessors. This has led designers to prefer multicore architectures over complex monolithic processors to keep up with the demand for rising computing throughput. Although processing units are getting smaller and simpler, the dramatic rise of their count on a single die has made the fabric that connects these processing units increasingly complex. These interconnect fabrics have become a bottleneck in improving overall system effciency. As a result, the design paradigm for multi-core chips is gradually shifting from a core-centric architecture towards an interconnect-centric architecture, where system efficiency is limited by the fabric rather than the processing ability of any individual core. This dissertation introduces three novel and synergistic circuit techniques to improve scalability of switch fabrics to make on-die integration of hundreds to thousands of cores feasible. 1) A matrix topology is proposed for designing a fully connected switch fabric that re-uses output buses for programming, and stores shue congurations at cross points. This significantly reduces routing congestion, lowers area/power, and improves per- formance. Silicon measurements demonstrate 47% energy savings in a 64-lane SIMD processor fabricated in 65nm CMOS over a conventional implementation. 2) A novel approach to handle high radix arbitration along with data routing is proposed. It optimally uses existing cross-bar interconnect resources without requiring any additional overhead. Bandwidth exceeding 2Tb/s is recorded in a test prototype fabricated in 65nm. 3) Building on the later, a new circuit topology to manage and update priority adaptively within the switch fabric without incurring additional delay or area is then proposed. Several assist circuit techniques, such as a thyristor based sense amplifier and self regenerating bi-directional repeaters are proposed for high speed energy efficient signaling to and from the switch fabric to improve overall routing efficiency. Using these techniques a 64 x 64 switch fabric with 128b data bus fabricated in 45nm achieves a throughput of 4.5Tb/s at single cycle latency while operating at 559MHz.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91506/1/sudhirks_1.pd

    A configurable vector processor for accelerating speech coding algorithms

    Get PDF
    The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

    A CCD based curvature wavefront sensor for Adaptive Optics in Astronomy

    Get PDF
    Adaptive Optik (AO) ist eine Technik, um Störungen in der Abbildung von Objekten im Teleskop auszugleichen. Diese Störungen werden von Fluktuationen des Brechungsindexes in der Erdatmosphäre hervorgerufen. Zum Messen dieser Störungen gibt es eine Reihe verschiedener Wellenfrontsensoren. Einer davon ist der 'Curvature'- Wellenfrontsensor, was soviel wie 'Krümmungs'- Wellenfrontsensor bedeutet. An der Europäischen Südsternwarte (ESO) in Garching werden für das Very Large Telescope (VLT) und das VLT Interferometer (VLTI) eine Reihe von adaptiven Optiksystemen entwickelt, die den Curvature - Wellenfrontsensor verwenden. Bisher wurden in Curvature AO-Systemen Avalanche Photodioden (APDs) als Detektoren verwendet, da sehr kurze Belichtungszeiten (200 Mikrosekunden) und ein sehr niedriges Ausleserauschen des Detektors nötig sind. Aufgrund von Fortschritten in der Herstellungstechnologie von Charge Coupled Devices (CCDs) entwickelten wir ein spezielles CCD und untersuchten dessen Leistungsfähigkeit für ein 60 Element Curvature AO System, das mit sehr schwachen Lichtsignalen arbeitet. Hier wurde erstmals ein CCD als Wellenfrontsensor verwendet. Diese Dissertation zeigt, daß ein CCD annäherend die gleiche Leistungsfähigkeit wie APDs bietet, jedoch zu einem Bruchteil der Kosten und geringerer Komplexität. Weiter hat das CCD eine höhere Quantenausbeute und einen größeren dynamischen Bereich als APDs. Ein Ausleserauschen von weniger als 1,5 Elektronen bei 4000 Bildern pro Sekunde wurde erreicht. Für AO Systeme, die Wellenfrontkorrekturen mit vielen Elementen durchführen, können dünne, von hinten beleuchtete CCDs, APDs als bessere Detektoren ersetzen. Diese Dissertation präsentiert das Konzept, das Design und die Bestimmung der Leistungsfähigkeit dieses CCDs. Erste (von vorne beleuchtete) CCDs wurden erfolgreich getestet und die Leistungsfähigkeit in einem Laborexperiment nachgewiesen

    Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems

    Full text link
    With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.Comment: 190 page

    Process-tolerant VLSI neural networks for applications in optimisation

    Get PDF

    Fine-grained Energy and Thermal Management using Real-time Power Sensors

    Get PDF
    With extensive use of battery powered devices such as smartphones, laptops an

    Social Insect-Inspired Adaptive Hardware

    Get PDF
    Modern VLSI transistor densities allow large systems to be implemented within a single chip. As technologies get smaller, fundamental limits of silicon devices are reached resulting in lower design yields and post-deployment failures. Many-core systems provide a platform for leveraging the computing resource on offer by deep sub-micron technologies and also offer high-level capabilities for mitigating the issues with small feature sizes. However, designing for many-core systems that can adapt to in-field failures and operation variability requires an extremely large multi-objective optimisation space. When a many-core reaches the size supported by the densities of modern technologies (thousands of processing cores), finding design solutions in this problem space becomes extremely difficult. Many biological systems show properties that are adaptive and scalable. This thesis proposes a self-optimising and adaptive, yet scalable, design approach for many-core based on the emergent behaviours of social-insect colonies. In these colonies there are many thousands of individuals with low intelligence who contribute, without any centralised control, to complete a wide range of tasks to build and maintain the colony. The experiments presented translate biological models of social-insect intelligence into simple embedded intelligence circuits. These circuits sense low-level system events and use this manage the parameters of the many-core's Network-on-Chip (NoC) during runtime. Centurion, a 128-node many-core, was created to investigate these models at large scale in hardware. The results show that, by monitoring a small number of signals within each NoC router, task allocation emerges from the social-insect intelligence models that can self-configure to support representative applications. It is demonstrated that emergent task allocation supports fault tolerance with no extra hardware overhead. The response-threshold decision making circuitry uses a negligible amount of hardware resources relative to the size of the many-core and is an ideal technology for implementing embedded intelligence for system runtime management of large-complexity single-chip systems
    corecore