1,334 research outputs found

    SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects

    Get PDF
    A 64-bit, 8 × 8 mesh network-on-chip (NoC) is presented that uses both new architectural and circuit design techniques to improve on-chip network energy-efficiency, latency, and throughput. First, we propose token flow control, which enables bypassing of flit buffering in routers, thereby reducing buffer size and their power consumption. We also incorporate reduced-swing signaling in on-chip links and crossbars to minimize datapath interconnect energy. The 64-node NoC is experimentally validated with a 2 × 2 test chip in 90 nm, 1.2 V CMOS that incorporates traffic generators to emulate the traffic of the full network. Compared with a fully synthesized baseline 8 × 8 NoC architecture designed to meet the same peak throughput, the fabricated prototype reduces network latency by 20% under uniform random traffic, when both networks are run at their maximum operating frequencies. When operated at the same frequencies, the SWIFT NoC reduces network power by 38% and 25% at saturation and low loads, respectively

    Course grained low power design flow using UPF

    Get PDF
    Increased system complexity has led to the substitution of the traditional bottom-up design flow by systematic hierarchical design flow. The main motivation behind the evolution of such an approach is the increasing difficulty in hardware realization of complex systems. With decreasing channel lengths, few key problems such as timing closure, design sign-off, routing complexity, signal integrity, and power dissipation arise in the design flows. Specifically, minimizing power dissipation is critical in several high-end processors. In high-end processors, the design complexity contributes to the overall dynamic power while the decreasing transistor size results in static power dissipation. This research aims at optimizing the design flow for power and timing using the unified power format (UPF). UPF provides a strategic format to specify power-aware design information at every stage in the flow. The low power reduction techniques enforced in this research are multi-voltage, multi-threshold voltage (Vth), and power gating with state retention. An inherent design challenge addressed in this research is the choice of power optimization techniques as the flow advances from synthesis to physical design. A top-down digital design flow for a 32 bit MIPS RISC processor has been implemented with and without UPF synthesis flow for 65nm technology. The UPF synthesis is implemented with two voltages, 1.08V and 0.864V (Multi-VDD). Area, power and timing metrics are analyzed for the flows developed. Power savings of about 20 % are achieved in the design flow with \u27multi-threshold\u27 power technique compared to that of the design flow with no low power techniques employed. Similarly, 30 % power savings are achieved in the design flow with the UPF implemented when compared to that of the design flow with \u27multi-threshold\u27 power technique employed. Thus, a cumulative power savings of 42% has been achieved in a complete power efficient design flow (UPF) compared to that of the generic top-down standard flow with no power saving techniques employed. This is substantiated by the low voltage operation of modules in the design, reduction in clock switching power by gating clocks in the design and extensive use of HVT and LVT standard cells for implementation. The UPF synthesis flow saw the worst timing slack and more area when compared to those of the `multi-threshold\u27 or the generic flow. Percentage increase in the area with UPF is approximately 15%; a significant source for this increase being the additional power controlling logic added

    Sub-Picosecond Jitter Clock Generation for Time Interleaved Analog to Digital Converter

    Get PDF
    Nowadays, Multi-GHz analog-to-digital converters (ADCs) are becoming more and more popular in radar systems, software-defined radio (SDR) and wideband communications, because they can realize much higher operation speed through using many interleaved sub-ADCs to relax ADC sampling rates. Although the time interleaved ADC has some issues such as gain mismatch, offset mismatch and timing skew between each ADC channel, these deterministic errors can be solved by previous works such as digital calibration technique. However, time-interleaved ADCs require a precise sample clock to achieve an acceptable effective-numberof-bits (ENOB) which can be degraded by jitter in the sample clock. The clock generation circuits presented in this work achieves sub-picosecond jitter performance in 180nm CMOS which is suitable for time-interleaved ADC. Two different test chips were fabricated in 180nm CMOS to investigate the low jitter design technique. The low jitter delay line in two chips were designed in two different ways, but both of them utilized the low jitter design technique. In first test chip, the measured RMS jitter is 0.1061ps for each delay stage. The second chip uses the proposed low jitter Delay-Locked Loop can work from 80MHz to 120MHz, which means it can provide the time interleaved ADC with 2.4GHz to 3.6GHz low jitter sample clock, the measured delay stage jitter performance in second test chip is 0.1085ps

    Direct Digital Demultiplexing of Analog TDM Signals for Cable Reduction in Ultrasound Imaging Catheters.

    Get PDF
    In real-time catheter based 3D ultrasound imaging applications, gathering data from the transducer arrays is difficult as there is a restriction on cable count due to the diameter of the catheter. Although area and power hungry multiplexing circuits integrated at the catheter tip are used in some applications, these are unsuitable for use in small sized catheters for applications like intracardiac imaging. Furthermore, the length requirement for catheters and limited power available to on-chip cable drivers leads to limited signal strength at the receiver end. In this paper an alternative approach using Analog Time Division Multiplexing (TDM) is presented which addresses the cable restrictions of ultrasound catheters. A novel digital demultiplexing technique is also described which allows for a reduction in the number of analog signal processing stages required. The TDM and digital demultiplexing schemes are demonstrated for an intracardiac imaging system that would operate in the 4 MHz to 11 MHz range. A TDM integrated circuit (IC) with 8:1 multiplexer is interfaced with a fast ADC through a micro-coaxial catheter cable bundle, and processed with an FPGA RTL simulation. Input signals to the TDM IC are recovered with -40 dB crosstalk between channels on the same micro-coax, showing the feasibility of this system for ultrasound imaging applications

    Comprehensive and modular stochastic modeling framework for the variability-aware assessment of Signal Integrity in high-speed links

    Get PDF
    This paper presents a comprehensive and modular modeling framework for stochastic signal integrity analysis of complex high-speed links. Such systems are typically composed of passive linear networks and nonlinear, usually active, devices. The key idea of the proposed contribution is to express the signals at the ports of each of such system elements or subnetworks as a polynomial chaos expansion. This allows one to compute, for each block, equivalent deterministic models describing the stochastic variations of the network voltages and currents. Such models are synthesized into SPICE-compatible circuit equivalents, which are readily connected together and simulated in standard circuit simulators. Only a single circuit simulation of such an equivalent network is required to compute the pertinent statistical information of the entire system, without the need of running a large number of time-consuming electromagnetic circuit co-simulations. The accuracy and efficiency of the proposed approach, which is applicable to a large class of complex circuits, are verified by performing signal integrity investigations of two interconnect examples

    Static noise margin analysis for CMOS logic cells in near-threshold

    Get PDF
    The advancement of semiconductor technology enabled the fabrication of devices with faster switching activity and chips with higher integration density. However, these advances are facing new impediments related to energy and power dissipation. Besides, the increasing demand for portable devices leads the circuit design paradigm to prioritize energy efficiency instead of performance. Altogether, this scenario motivates engineers towards reducing the supply voltage to the near and subthreshold regime to increase the lifespan of battery-powered devices. Even though operating in these regime offer interesting energy-frequency trade-offs, it brings challenges concerning noise tolerance. As the supply voltage reduces, the available noise margins decrease, and circuits become more prone to functional failures. In addition, near and subthreshold circuits are more susceptible to manufacturing variability, hence further aggravating noise issues. Other issues, such as wire minimization and gate fan-out, also contribute to the relevance of evaluating the noise margin of circuits early in the design Accordingly, this work investigates how to improve the static noise margin of digital synchronous circuits that will operate at the near/subthreshold regime. This investigation produces a set of three original contributions. The first is an automated tool to estimate the static noise margin of CMOS combinational cells. The second contribution is a realistic static noise margin estimation methodology that considers process-voltage-temperature variations. Results show that the proposed methodology allows to reduce up to 70% of the static noise margin pessimism. Finally, the third contribution is the noise-aware cell design methodology and the inclusion of a noise evaluation of complex circuits during the logic synthesis. The resulting library achieved higher static noise margin (up to 24%) and less spread among different cells (up to 62%).Os avanços na tecnologia de semicondutores possibilitou que se fabricasse dispositivos com atividade de chaveamento mais rápida e com maior capacidade de integração de transistores. Estes avanços, todavia, impuseram novos empecilhos relacionados com a dissipação de potência e energia. Além disso, a crescente demanda por dispositivos portáteis levaram à uma mudança no paradigma de projeto de circuitos para que se priorize energia ao invés de desempenho. Este cenário motivou à reduzir a tensão de alimentação com qual os dispositivos operam para um regime próximo ou abaixo da tensão de limiar, com o objetivo de aumentar sua duração de bateria. Apesar desta abordagem balancear características de performance e energia, ela traz novos desafios com relação a tolerância à ruído. Ao reduzirmos a tensão de alimentação, também reduz-se a margem de ruído disponível e, assim, os circuitos tornam-se mais suscetíveis à falhas funcionais. Somado à este efeito, circuitos com tensões de alimentação nestes regimes são mais sensíveis à variações do processo de fabricação, logo agravando problemas com ruído. Existem também outros aspectos, tais como a miniaturização das interconexões e a relação de fan-out de uma célula digital, que incentivam a avaliação de ruído nas fases iniciais do projeto de circuitos integrados Por estes motivos, este trabalho investiga como aprimorar a margem de ruído estática de circuitos síncronos digitais que irão operar em tensões no regime de tensão próximo ou abaixo do limiar. Esta investigação produz um conjunto de três contribuições originais. A primeira é uma ferramenta capaz de avaliar automaticamente a margem de ruído estática de células CMOS combinacionais. A segunda contribuição é uma metodologia realista para estimar a margem de ruído estática considerando variações de processo, tensão e temperatura. Os resultados obtidos mostram que a metodologia proposta permitiu reduzir até 70% do pessimismo das margens de ruído estática, Por último, a terceira contribuição é um fluxo de projeto de células combinacionais digitais considerando ruído, e uma abordagem para avaliar a margem de ruído estática de circuitos complexos durante a etapa de síntese lógica. A biblioteca de células resultante deste fluxo obteve maior margem de ruído (até 24%) e menor variação entre diferentes células (até 62%)

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz
    corecore