12 research outputs found
Mapping and Configuration Methods for Multi-Use-Case Networks on Chips
To provide a scalable communication infrastructure for Systems on Chips (SoCs), Networks on Chips (NoCs), a communication centric design paradigm is needed. To be cost effective, SoCs are often programmable and integrate several different applications or use-cases on to the same chip. For the SoC platform to support the different use-cases, the NoC architecture should satisfy the performance constraints of each individual use-case. In this work we motivate the need to consider multiple use-cases during the NoC design process. We present a method to ef ciently map the applications on to the NoC architecture, satisfying the design constraints of each individual use-case. We also present novel ways to dynamically recon- gure the network across the different use-cases and explore the possibility of integrating Dynamic Voltage and Frequency Scaling (DVS/DFS) techniques with the use-case centric NoC design methodology. We validate the performance of the design methodology on several SoC applications. The dynamic recon guration of the NoC integrated with DVS/DFS schemes results in large power savings for the resulting NoC systems
A Methodology for Mapping Multiple Use-Cases onto Networks on Chips
A communication-centric design approach, Networks on Chips (NoCs), has emerged as the design paradigm for designing a scalable communication infrastructure for future Systems on Chips (SoCs). As technology advances, the number of applications or use-cases integrated on a single chip increases rapidly. The different use-cases of the SoC have different communication requirements (such as different bandwidth, latency constraints) and traffic patterns. The underlying NoC architecture has to satisfy the constraints of all the use-cases. In this work, we present a methodology to map multiple use-cases onto the NoC architecture, satis- fying the constraints of each use-case. We present dynamic re-configuration mechanisms that match the NoC configura- tion to the communication characteristics of each use-case, also accounting for use-cases that can run in parallel. The methodology is applied to several real and synthetic SoC benchmarks, which result in a large reduction in NoC area (an average of 80%) and power consumption (an average of 54%) compared to traditional design approaches
Recommended from our members
Design Space Exploration of Accelerators for Warehouse Scale Computing
With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership.
This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization.
The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization.
When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs.
The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks
An Application-Specific Design Methodology for On-chip Crossbar Generation
Designing a power-efficient interconnection architec- ture for MultiProcessor Systems-on-Chips (MPSoCs) satisfying the application performance constraints is a nontrivial task. In order to meet the tight time-to-market constraints and to effec- tively handle the design complexity, it is essential to provide a computer-aided design tool support for automating this task. In this paper, we address the issue of “application-specific design of optimal crossbar architecture” satisfying the performance re- quirements of the application and optimal binding of the cores onto the crossbar resources. We present a simulation-based design approach that is based on the analysis of the actual traffic trace of the application, considering local variations in traffic rates, temporal overlap among traffic streams, and criticality of traffic streams. Our approach is physical design aware, where the wiring complexity of the crossbar architecture is also considered during the design process. This leads to detecting timing violations on the wires early in the design cycle and to having accurate estimates of the power consumption on the wires. We apply our methodology onto several MPSoC designs, and the synthesized crossbar plat- forms are validated for performance by cycle-accurate SystemC simulation of the designs. The crossbar matrix power consumption values are based on the synthesis of the register transfer level models of the designs, obtained using industry standard tools. The experimental case studies show large reduction in communication architecture power consumption (45.3% on average) and total wirelength (38% on average) for the MPSoC designs when com- pared with traditional design approaches. The synthesized cross- bar designs also lead to large reduction in transaction latencies (up to 7×) when compared with the existing design approaches
NoC Synthesis Flow for Customized Domain Specific Mutliprocessor Systems-on-Chip
The growing complexity of customizable single-chip multiprocessors is requiring communication resources that can only be provided by a highly-scalable communication infrastructure. This trend is exemplified by the growing number of network-on-chip (NoC) architectures that have been proposed recently for system-on-chip (SoC) integration. Developing NoC-based systems tailored to a particular application domain is crucial for achieving high-performance, energy-efficient customized solutions. The effectiveness of this approach largely depends on the availability of an ad hoc design methodology that, starting from a high-level application specification, derives an optimized NoC configuration with respect to different design objectives and instantiates the selected application specific on-chip micronetwork. Automatic execution of these design steps is highly desirable to increase SoC design productivity. This work illustrates a complete synthesis flow, called Netchip, for customized NoC architectures, that partitions the development work into major steps (topology mapping, selection, and generation) and provides proper tools for their automatic execution (SUNMAP, xpipescompiler). The entire flow leverages the flexibility of a fully reusable and scalable network components library called xpipes, consisting of highly-parameterizable network building blocks (network interface, switches, switch-to-switch links) that are design-time tunable and composable to achieve arbitrary topologies and customized domain-specific NoC architectures. Several experimental case studies are presented In the work, showing the powerful design space exploration capabilities of the proposed methodology and tools
Network-on-Chip
Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems
Constraint-Driven Communication Synthesis
Constraint-driven Communication Synthesis enables the automatic design of the communication architecture of a complex system from a library of pre-defined Intellectual Property (IP) components. The key communication parameters that govern all the point-to-point interactions among system modules are captured as a set of arc constraints in the communication constraint graph. Similarly, the communication features o#ered by each of the components available in the IP communication library are captured as a set of feature resources together with its cost figures. Then, every communication architecture that can be built using the available components while satisfying all constraints is implicitly considered (as an implementation graph matching the constraint graph) to derive the optimum design solution with respect to the desired cost figure. The corresponding constrained optimization problem is e#ciently solved by a novel algorithm that is presented here together with its rigorous theoretical foundations
Abstract Constraint-Driven Communication Synthesis
Constraint-driven Communication Synthesis enables the automatic design of the communication architecture of a complex system from a library of pre-defined Intellectual Property (IP) components. The key communication parameters that govern all the point-to-point interactions among system modules are captured as a set of arc constraints in the communication constraint graph. Similarly, the communication features offered by each of the components available in the IP communication library are captured as a set of feature resources together with its cost figures. Then, every communication architecture that can be built using the available components while satisfying all constraints is implicitly considered (as an implementation graph matching the constraint graph) to derive the optimum design solution with respect to the desired cost figure. The corresponding constrained optimization problem is efficiently solved by a novel algorithm that is presented here together with its rigorous theoretical foundations.