73 research outputs found

    Clustering-Based Simultaneous Task and Voltage Scheduling for NoC Systems

    Get PDF
    Network-on-Chip (NoC) is emerging as a promising communication structure, which is scalable with respect to chip complexity. Meanwhile, latest chip designs are increasingly leveraging multiple voltage-frequency domains for energy-efficiency improvement. In this work, we propose a simultaneous task and voltage scheduling algorithm for energy minimization in NoC based designs. The energy-latency tradeoff is handled by Lagrangian relaxation. The core algorithm is a clustering based approach which not only assigns voltage levels and starting time to each task (or Processing Element) but also naturally finds voltage-frequency clusters. Compared to a recent previous work, which performs task scheduling and voltage assignment sequentially, our method leads to an average of 20 percent energy reduction

    FPGA dynamic and partial reconfiguration : a survey of architectures, methods, and applications

    Get PDF
    Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption

    Design Methods and Tools for Application-Specific Predictable Networks-on-Chip

    Get PDF
    As the complexity of applications grows with each new generation, so does the demand for computation power. To satisfy the computation demands at manageable power levels, we see a shift in the design paradigm from single processor systems to Multiprocessor Systems-on-Chip (MPSoCs). MPSoCs leverage the parallelism in applications to increase the performance at the same power levels. To further improve the computation to power consumption ratio, MPSoCs for embedded applications are heterogeneous and integrate cores that are specialized to perform the different functionalities of the application. With technology scaling, wire power consumption is increasing compared to logic, making communication as expensive as computation. Therefore customizing the interconnect is necessary to achieve energy efficiency. Designing an optimal application specific Network-on-Chip (NoC), that meets application demands, requires the exploration of a large design space. Automatic design and optimization of the NoC is required in order to achieve fast design closure, especially for heterogeneous MPSoCs. To continue to meet the computation requirements of future applications new technologies are emerging. Three dimensional integration promises to increase the number of transistors by stacking multiple silicon layers. This will lead to an increase in the number of cores of the MPSoCs resulting in increased communication demands. To compensate for the increase in the wire delay in new technology nodes as well as to reduce the power consumption further, multi-synchronous design is becoming popular. With multiple clock signals, different parts of the MPSoC can be clocked at different frequencies according to the current demands of the application and can even be shutdown when they are not used at all. This further complicates the design of the NoC.Many applications require different levels of guarantee from the NoC in order to perform their functionality correctly. As communication traffic patterns become more complex, the performance of the NoC can no longer be predicted statically. Therefore designing the interconnect network requires that such guarantees are provided during the dynamic operation of the system which includes the interaction with major subsystems (i.e., main memory) and not just the interconnect itself. In this thesis, I present novel methods to design application-specific NoCs that meet performance demands, under the constraints of new technologies. To provide different levels of Quality of Service, I integrate methods to estimate the NoC performance during the design phase of the interconnect topology. I present methods and architectures for NoCs to efficiently access memory systems, in order to achieve predictable operation of the systems from the point of view of the communication as well as the bottleneck target devices. Therefore the main contribution of the thesis is twofold: scientific as I propose new algorithms to perform topology synthesis and engineering by presenting extensive experiments and architectures for NoC design

    Clustering-Based Simultaneous Task and Voltage Scheduling for NoC Systems

    Get PDF
    Network-on-Chip (NoC) is emerging as a promising communication structure, which is scalable with respect to chip complexity. Meanwhile, latest chip designs are increasingly leveraging multiple voltage-frequency domains for energy-efficiency improvement. In this work, we propose a simultaneous task and voltage scheduling algorithm for energy minimization in NoC based designs. The energy-latency tradeoff is handled by Lagrangian relaxation. The core algorithm is a clustering based approach which not only assigns voltage levels and starting time to each task (or Processing Element) but also naturally finds voltage-frequency clusters. Compared to a recent previous work, which performs task scheduling and voltage assignment sequentially, our method leads to an average of 20 percent energy reduction

    Physical design methodologies for monolithic 3D ICs

    Get PDF
    The objective of this research is to develop physical design methodologies for monolithic 3D ICs and use them to evaluate the improvements in the power-performance envelope offered over 2D ICs. In addition, design-for-test (DfT) techniques essential for the adoption of shorter term through-silicon-via (TSV) based 3D ICs are explored. Testing of TSV-based 3D ICs is one of the last challenges facing their commercialization. First, a pre-bond testable 3D scan chain construction technique is developed. Next, a transition-delay-fault test architecture is presented, along with a study on how to mitigate IR-drop. Finally, to facilitate partitioning, a quick and accurate framework for test-TSV estimation is developed. Block-level monolithic 3D ICs will be the first to emerge, as significant IP can be reused. However, no physical design flows exist, and hence a monolithic 3D floorplanning framework is developed. Next, inter-tier performance differences that arise due to the not yet mature fabrication process are investigated and modeled. Finally, an inter-tier performance-difference aware floorplanner is presented, and it is demonstrated that high quality 3D floorplans are achievable even under these inter-tier differences. Monolithic 3D offers sufficient integration density to place individual gates in three dimensions and connect them together. However, no tools or techniques exist that can take advantage of the high integration density offered. Therefore, a gate-level framework that leverages existing 2D ICs tools is presented. This framework also provides congestion modeling and produces results that minimize routing congestion. Next, this framework is extended to commercial 2D IC tools, so that steps such as timing optimization and clock tree synthesis can be applied. Finally, a voltage-drop-aware partitioning technique is presented that can alleviate IR-drop issues, without any impact on the performance or maximum operating temperature of the chip.Ph.D

    Physical Planning and Uncore Power Management for Multi-Core Processors

    Get PDF
    For the microprocessor technology of today and the foreseeable future, multi-core is a key engine that drives performance growth under very tight power dissipation constraints. While previous research has been mostly focused on individual processor cores, there is a compelling need for studying how to efficiently manage shared resources among cores, including physical space, on-chip communication and on-chip storage. In managing physical space, floorplanning is the first and most critical step that largely affects communication efficiency and cost-effectiveness of chip designs. We consider floorplanning with regularity constraints that requires identical processing/memory cores to form an array. Such regularity can greatly facilitate design modularity and therefore shorten design turn-around time. Very little attention has been paid to automatic floorplanning considering regularity constraints because manual floorplanning has difficulty handling the complexity as chip core count increases. In this dissertation work, we investigate the regularity constraints in a simulated-annealing based floorplanner for multi/many core processor designs. A simple and effective technique is proposed to encode the regularity constraints in sequence-pair, which is a classic format of data representation in automatic floorplanning. To the best of our knowledge, this is the first work on regularity-constrained floorplanning in the context of multi/many core processor designs. On-chip communication and shared last level cache (LLC) play a role that is at least as equally important as processor cores in terms of chip performance and power. This dissertation research studies dynamic voltage and frequency scaling for on-chip network and LLC, which forms a single uncore domain of voltage and frequency. This is in contrast to most previous works where the network and LLC are partitioned and associated with processor cores based on physical proximity. The single shared domain can largely avoid the interfacing overhead across domain boundaries and is practical and very useful for industrial products. Our goal is to minimize uncore energy dissipation with little, e.g., 5% or less, performance degradation. The first part of this study is to identify a metric that can reflect the chip performance determined by uncore voltage/frequency. The second part is about how to monitor this metric with low overhead and high fidelity. The last part is the control policy that decides uncore voltage/frequency based on monitoring results. Our approach is validated through full system simulations on public architecture benchmarks

    Optimising and evaluating designs for reconfigurable hardware

    No full text
    Growing demand for computational performance, and the rising cost for chip design and manufacturing make reconfigurable hardware increasingly attractive for digital system implementation. Reconfigurable hardware, such as field-programmable gate arrays (FPGAs), can deliver performance through parallelism while also providing flexibility to enable application builders to reconfigure them. However, reconfigurable systems, particularly those involving run-time reconfiguration, are often developed in an ad-hoc manner. Such an approach usually results in low designer productivity and can lead to inefficient designs. This thesis covers three main achievements that address this situation. The first achievement is a model that captures design parameters of reconfigurable hardware and performance parameters of a given application domain. This model supports optimisations for several design metrics such as performance, area, and power consumption. The second achievement is a technique that enhances the relocatability of bitstreams for reconfigurable devices, taking into account heterogeneous resources. This method increases the flexibility of modules represented by these bitstreams while reducing configuration storage size and design compilation time. The third achievement is a technique to characterise the power consumption of FPGAs in different activity modes. This technique includes the evaluation of standby power and dedicated low-power modes, which are crucial in meeting the requirements for battery-based mobile devices

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
    • …
    corecore