253 research outputs found

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

    CD-Xbar : a converge-diverge crossbar network for high-performance GPUs

    Get PDF
    Modern GPUs feature an increasing number of streaming multiprocessors (SMs) to boost system throughput. How to construct an efficient and scalable network-on-chip (NoC) for future high-performance GPUs is particularly critical. Although a mesh network is a widely used NoC topology in manycore CPUs for scalability and simplicity reasons, it is ill-suited to GPUs because of the many-to-few-to-many traffic pattern observed in GPU-compute workloads. Although a crossbar NoC is a natural fit, it does not scale to large SM counts while operating at high frequency. In this paper, we propose the converge-diverge crossbar (CD-Xbar) network with round-robin routing and topology-aware concurrent thread array (CTA) scheduling. CD-Xbar consists of two types of crossbars, a local crossbar and a global crossbar. A local crossbar converges input ports from the SMs into so-called converged ports; the global crossbar diverges these converged ports to the last-level cache (LLC) slices and memory controllers. CD-Xbar provides routing path diversity through the converged ports. Round-robin routing and topology-aware CTA scheduling balance network traffic among the converged ports within a local crossbar and across crossbars, respectively. Compared to a mesh with the same bisection bandwidth, CD-Xbar reduces NoC active silicon area and power consumption by 52.5 and 48.5 percent, respectively, while at the same time improving performance by 13.9 percent on average. CD-Xbar performs within 2.9 percent of an idealized fully-connected crossbar. We further demonstrate CD-Xbar's scalability, flexibility and improved performance perWatt (by 17.1 percent) over state-of-the-art GPU NoCs which are highly customized and non-scalable

    Reconfigurable architecture for very large scale microelectronic systems

    Get PDF

    Design of Reconfigurable Crossbar Switch for BiNoC Router

    Get PDF
    this paper presents implementation of 10x10 reconfigurable crossbar switch (RCS) architecture for Dynamic Self-Reconfigurable BiNoC Architecture for Network On Chip. Its main purpose is to increase the performance, flexibility. This paper presents a VHDL based cycle accurate register transfer level model for evaluating the, Power and Area of reconfigurable cross bar switch in BiNoC architectures. We implemented a parameterized register transfer level design of reconfigurable crossbar switch (RCS) architecture. The design is parameterized on (i) size of packets, (ii) length and width of physical links, (iii) number, and depth of arbiters, and (iv) switching technique. The paper discusses in detail the architecture and characterization of the various reconfigurable crossbar switch (RCS) architecture components. The characterized values were integrated into the VHDL based RTL design to build the cycle accurate performance model. In this paper we show the result of simple 10x10 crossbar switch .The results include VHDL simulation of RCS on Xilinx ISE 13.1 software tool

    SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

    Get PDF
    As technology scales, SoCs are increasing in core counts, leading to the need for scalable NoCs to interconnect the multiple cores on the chip. Given aggressive SoC design targets, NoCs have to deliver low latency, high bandwidth, at low power and area overheads. In this paper, we propose Single-cycle Multi-hop Asynchronous Repeated Traversal (SMART) NoC, a NoC that reconfigures and tailors a generic mesh topology for SoC applications at runtime. The heart of our SMART NoC is a novel low-swing clockless repeated link circuit embedded within the router crossbars, that allows packets to potentially bypass all the way from source to destination core within a single clock cycle, without being latched at any intermediate router. Our clockless repeater link has been proven in silicon in 45nm SOI. Results show that at 2GHz, we can traverse 8mm within a single cycle, i.e. 8 hops with 1mm cores. We implement the SMART NoC to layout and show that SMART NoC gives 60% latency savings, and 2.2X power savings compared to a baseline mesh NoC.United States. Defense Advanced Research Projects Agency. The Ubiquitous High-Performance Computing Progra
    corecore