235 research outputs found

    Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory

    Get PDF
    The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms

    Extending the performance of hybrid NoCs beyond the limitations of network heterogeneity

    Get PDF
    To meet the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing with the performance bottleneck of conventional metal based interconnects (wireline), alternative interconnect fabrics such as inhomogeneous three-dimensional integrated Network-on-Chip (3D NoC) and hybrid wired-wireless Network-on-Chip (WiNoC) have emanated as a cost-effective solution for emerging System-on-Chip (SoC) design. However, these interconnects trade-off optimized performance for cost by restricting the number of area and power hungry 3D routers and wireless nodes. Moreover, the non-uniform distributed traffic in chip multiprocessor (CMP) demands an on-chip communication infrastructure which can avoid congestion under high traffic conditions while possessing minimal pipeline delay at low-load conditions. To this end, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in such emerging hybrid NoCs. The proposed router transmits a flit using dimension-ordered routing (DoR) in the bypass datapath at low-loads. When the output port required for intra-dimension bypassing is not available, the packet is routed adaptively to avoid congestion. The router also has a simplified virtual channel allocation (VA) scheme that yields a non-speculative low-latency pipeline. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able balance the traffic in hybrid NoCs to achieve low-latency communication under various traffic loads. Simulation shows that, the proposed router can reduce applications’ execution time by an average of 16.9% compared to low-latency routers such as SWIFT. By reducing the latency between 2D routers (or wired nodes) and 3D routers (or wireless nodes) the proposed router can improve performance efficiency in terms of average packet delay by an average of 45% (or 50%) in 3D NoCs (or WiNoCs)

    Balanced truncation for time-delay systems via approximate gramians

    Get PDF
    In circuit simulation, when a large RLC network is connected with delay elements, such as transmission lines, the resulting system is a time-delay system (TDS). This paper presents a new model order reduction (MOR) scheme for TDSs with state time delays. It is the first time to reduce a TDS using balanced truncation. The Lyapunov-type equations for TDSs are derived, and an analysis of their computational complexity is presented. To reduce the computational cost, we approximate the controllability and observability Gramians in the frequency domain. The reduced-order models (ROMs) are then obtained by balancing and truncating the approximate Gramians. Numerical examples are presented to verify the accuracy and efficiency of the proposed algorithm. ©2011 IEEE.published_or_final_versionThe 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011), Yokohama, Japan, 25-28 January 2011. In Proceedings of the 16th ASP-DAC, 2011, p. 55-60, paper 1C-

    A moment-matching scheme for the passivity-preserving model order reduction of indefinite descriptor systems with possible polynomial parts

    Get PDF
    Passivity-preserving model order reduction (MOR) of descriptor systems (DSs) is highly desired in the simulation of VLSI interconnects and on-chip passives. One popular method is PRIMA, a Krylov-subspace projection approach which preserves the passivity of positive semidefinite (PSD) structured DSs. However, system passivity is not guaranteed by PRIMA when the system is indefinite. Furthermore, the possible polynomial parts of singular systems are normally not captured. For indefinite DSs, positive-real balanced truncation (PRBT) can generate passive reduced-order models (ROMs), whose main bottleneck lies in solving the dual expensive generalized algebraic Riccati equations (GAREs). This paper presents a novel moment-matching MORfor indefinite DSs, which preserves both the system passivity and, if present, also the improper polynomial part. This method only requires solving one GARE, therefore it is cheaper than existing PRBT schemes. On the other hand, the proposed algorithm is capable of preserving the passivity of indefinite DSs, which is not guaranteed by traditional moment-matching MORs. Examples are finally presented showing that our method is superior to PRIMA in terms of accuracy. ©2011 IEEE.published_or_final_versionThe 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011), Yokohama, Japan, 25-28 January 2011. In Proceedings of the 16th ASP-DAC, 2011, p. 49-54, paper 1C-

    Scheduling and Fluid Routing for Flow-Based Microfluidic Laboratories-on-a-Chip

    Get PDF
    Microfluidic laboratories-on-a-chip (LoCs) are replacing the conventional biochemical analyzers and are able to integrate the necessary functions for biochemical analysis on-chip. There are several types of LoCs, each having its advantages and limitations. In this paper we are interested in flow-based LoCs, in which a continuous flow of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex units, such as micropumps, switches, mixers, and multiplexers, can be built. We consider that the architecture of the LoC is given, and we are interested in synthesizing an implementation, consisting of the binding of operations in the application to the functional units of the architecture, the scheduling of operations and the routing and scheduling of the fluid flows, such that the application completion time is minimized. To solve this problem, we propose a list scheduling-based application mapping (LSAM) framework and evaluate it by using real-life as well as synthetic benchmarks. When biochemical applications contain fluids that may adsorb on the substrate on which they are transported, the solution is to use rinsing operations for contamination avoidance. Hence, we also propose a rinsing heuristic, which has been integrated in the LSAM framework

    МЕТОД УВЕЛИЧЕНИЯ СТАБИЛЬНОСТИ ФИЗИЧЕСКИ НЕКЛОНИРУЕМОЙ ФУНКЦИИ ТИПА «АРБИТР»

    Get PDF
    The paper presents a reliability enhancement method for an arbiter physically unclonable function (A-PUF). The proposed technique has reasonable challenge-response generation time and does not cause additional hardware overheads. A time difference of a test pulse delay has been used as a basis for A-PUF parametric model development. The proposed approach has been verified on a real programmable logic device.Предлагается метод повышения стабильности физически неклонируемой функции типа «арбитр» без увеличения затрат на аппаратное обеспечение и значительного роста времени получения ответа. Предлагается развернутая параметрическая модель формирования временной разницы тестового сигнала на входах арбитра. Проводится проверка метода на реальных устройствах программируемой логики

    Review of Display Technologies Focusing on Power Consumption

    Get PDF
    Producción CientíficaThis paper provides an overview of the main manufacturing technologies of displays, focusing on those with low and ultra-low levels of power consumption, which make them suitable for current societal needs. Considering the typified value obtained from the manufacturer’s specifications, four technologies—Liquid Crystal Displays, electronic paper, Organic Light-Emitting Display and Electroluminescent Displays—were selected in a first iteration. For each of them, several features, including size and brightness, were assessed in order to ascertain possible proportional relationships with the rate of consumption. To normalize the comparison between different display types, relative units such as the surface power density and the display frontal intensity efficiency were proposed. Organic light-emitting display had the best results in terms of power density for small display sizes. For larger sizes, it performs less satisfactorily than Liquid Crystal Displays in terms of energy efficiency.Junta de Castilla y León (Programa de apoyo a proyectos de investigación-Ref. VA036U14)Junta de Castilla y León (programa de apoyo a proyectos de investigación - Ref. VA013A12-2)Ministerio de Economía, Industria y Competitividad (Grant DPI2014-56500-R

    Elastic circuits

    Get PDF
    Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document
    corecore