58 research outputs found

    Area-efficient snoopy-aware NoC design for high-performance chip multiprocessor systems

    Get PDF
    Manycore CMP systems are expected to grow to tens or even hundreds of cores. In this paper we show that the effective co-design of both, the network-on-chip and the coherence protocol, improves performance and power meanwhile total area resources remain bounded. We propose a snoopy-aware network-on-chip topology made of two mesh-of-tree topologies. Reducing the complexity of the coherence protocol - and hence its resources - and moving this complexity to the network, leads to a global decrease in power consumption meanwhile area is barely affected. Benefits of our proposal are due to the high-throughput and low delay of the network, but also due to the simplicity of the coherence protocol. The proposed network and protocol minimizes communication amongst cores when compared to traditional solutions based either on 2D-mesh topologies or in directory-based protocols. (C) 2015 Elsevier Ltd. All rights reserved.Roca Pérez, A.; Hernández Luz, C.; Lodde, M.; Flich Cardo, J. (2015). Area-efficient snoopy-aware NoC design for high-performance chip multiprocessor systems. Computers and Electrical Engineering. 45:374-385. doi:10.1016/j.compeleceng.2015.04.020S3743854

    On the Potential of NoC Virtualization for Multicore Chips

    Full text link

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    Design and Implementation of a Chip Multiprocessor with an Efficient Multilevel Cache System

    Get PDF
    Computer designers utilize the recent huge advances in Very Large Scale Integration (VLSI) to get Chip Multiprocessor (CMP) by placing several processors on the same chip die. The CMP is the dominant architecture to improve the performance of the current computing systems. However, accessing a shared data by several processors is a primary challenge in CMP. The data consistency must be reached among all memory hierarchies to ensure correct behavior and higher performance. This paper, proposed a CMP with an efficient multilevel cache system, which enhances miss rate and latency (penalty) by designing and implementation of different write policies with two levels of cache. The proposed system is implemented and tested using Hardware Description Language (VHDL) on Altera’s FPGA chip. The results show that a combination of write-through without buffer for the first level and write-back for the second level offers a clear improvement on the multilevel cache system performance

    SIMD based multicore processor for image and video processing

    Get PDF
    制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

    OrthoNoC: a broadcast-oriented dual-plane wireless network-on-chip architecture

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksOn-chip communication remains as a key research issue at the gates of the manycore era. In response to this, novel interconnect technologies have opened the door to new Network-on-Chip (NoC) solutions towards greater scalability and architectural flexibility. Particularly, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. This work presents ORTHONOC, a wired-wireless architecture that differs from existing proposals in that both network planes are decoupled and driven by traffic steering policies enforced at the network interfaces. With these and other design decisions, ORTHONOC seeks to emphasize the ordered broadcast advantage offered by the wireless technology. The performance and cost of ORTHONOC are first explored using synthetic traffic, showing substantial improvements with respect to other wired-wireless designs with a similar number of antennas. Then, the applicability of ORTHONOC in the multiprocessor scenario is demonstrated through the evaluation of a simple architecture that implements fast synchronization via ordered broadcast transmissions. Simulations reveal significant execution time speedups and communication energy savings for 64-threaded benchmarks, proving that the value of ORTHONOC goes beyond simply improving the performance of the on-chip interconnect.Peer ReviewedPostprint (author's final draft

    Analysis of opportunities for cache coherence in heterogeneous embedded systems

    Full text link
    [ES] En el contexto de los sistemas empotrados heterogéneos surgen nuevas necesidades y retos. Este trabajo se va a centrar en la coherencia de éstos sistemas para analizar la posibilidad de aplicar técnicas que se ajusten mejor a dichas necesidades. Previo al análisis se presentará en qué consiste y qué soluciones se proponen actualmente para el problema de la coherencia.[EN] New challenges arise in the context of embedded heterogeneous systems. This work is focused on the coherence of those systems in order to analyze the posibility of applying techniques that best cope with such challenges. Prior to that, we will offer an explanation of what the coherency problem is and what the currently proposed solutions to that problem are.Esteve García, A. (2012). Analysis of opportunities for cache coherence in heterogeneous embedded systems. http://hdl.handle.net/10251/29846Archivo delegad

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Jigsaw: Scalable software-defined caches

    Get PDF
    Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency.United States. Defense Advanced Research Projects Agency (DARPA PERFECT contract HR0011-13-2-0005)Quanta Computer (Firm
    corecore