2,120 research outputs found

    Addressing Manufacturing Challenges in NoC-based ULSI Designs

    Full text link
    Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

    Cost Effective Routing Implementations for On-chip Networks

    Full text link
    Arquitecturas de múltiples núcleos como multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) actuales se basan en la eficacia de las redes dentro del chip (NoC) para la comunicación entre los diversos núcleos. Un diseño eficiente de red dentro del chip debe ser escalable y al mismo tiempo obtener valores ajustados de área, latencia y consumo de energía. Para diseños de red dentro del chip de propósito general se suele usar topologías de malla 2D ya que se ajustan a la distribución del chip. Sin embargo, la aparición de nuevos retos debe ser abordada por los diseñadores. Una mayor probabilidad de defectos de fabricación, la necesidad de un uso optimizado de los recursos para aumentar el paralelismo a nivel de aplicación o la necesidad de técnicas eficaces de ahorro de energía, puede ocasionar patrones de irregularidad en las topologías. Además, el soporte para comunicación colectiva es una característica buscada para abordar con eficacia las necesidades de comunicación de los protocolos de coherencia de caché. En estas condiciones, un encaminamiento eficiente de los mensajes se convierte en un reto a superar. El objetivo de esta tesis es establecer las bases de una nueva arquitectura para encaminamiento distribuido basado en lógica que es capaz de adaptarse a cualquier topología irregular derivada de una estructura de malla 2D, proporcionando así una cobertura total para cualquier caso resultado de soportar los retos mencionados anteriormente. Para conseguirlo, en primer lugar, se parte desde una base, para luego analizar una evolución de varios mecanismos, y finalmente llegar a una implementación, que abarca varios módulos para alcanzar el objetivo mencionado anteriormente. De hecho, esta última implementación tiene por nombre eLBDR (effective Logic-Based Distributed Routing). Este trabajo cubre desde el primer mecanismo, LBDR, hasta el resto de mecanismos que han surgido progresivamente.Rodrigo Mocholí, S. (2010). Cost Effective Routing Implementations for On-chip Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8962Palanci

    SpiNNaker: Fault tolerance in a power- and area- constrained large-scale neuromimetic architecture

    Get PDF
    AbstractSpiNNaker is a biologically-inspired massively-parallel computer designed to model up to a billion spiking neurons in real-time. A full-fledged implementation of a SpiNNaker system will comprise more than 105 integrated circuits (half of which are SDRAMs and half multi-core systems-on-chip). Given this scale, it is unavoidable that some components fail and, in consequence, fault-tolerance is a foundation of the system design. Although the target application can tolerate a certain, low level of failures, important efforts have been devoted to incorporate different techniques for fault tolerance. This paper is devoted to discussing how hardware and software mechanisms collaborate to make SpiNNaker operate properly even in the very likely scenario of component failures and how it can tolerate system-degradation levels well above those expected

    FOS: a low-power cache organization for multicores

    Get PDF
    [EN] The cache hierarchy of current multicore processors typically consists of one or two levels of private caches per core and a large shared last-level cache. This approach incurs area and energy wasting due to oversizing the private cache space, data replication through the inclusive cache levels, as well as the use of highly set-associative caches. In this paper, we claim that although this is the commonly adopted approach, it presents important design issues that can be addressed by a more energy efficient organization. This work proposes Flat On-chip Storage (FOS), a novel cache organization that, aimed at addressing energy and area on low-power processors, resolves the mentioned issues. For this purpose, FOS combines L2 and L3 cache levels into a single one, organized as a flat space, and composed of a pool of private small cache slices. These slices are initially powered off to save energy, and they are powered on and assigned to cores provided that the system performance is expected to improve. To provide fast and uniform access from the private L1 caches to the FOS's cache slices, multiple architectural challenges are overcome, which entails the design of a custom optical network-on-chip. Experimental results show that FOS achieves significant energy savings on both static and dynamic energy over conventional cache organizations with the same storage capacity. FOS static energy savings are as much as 60% over an electrically connected shared cache; these savings grow up to 75% compared to optically connected baselines. Moreover, despite deactivating part of the cache space, FOS achieves similar performance values as those achieved by conventional approaches.Puche-Lara, J.; Petit Martí, SV.; Sahuquillo Borrás, J.; Gómez Requena, ME. (2019). FOS: a low-power cache organization for multicores. The Journal of Supercomputing (Online). 75(10):6542-6573. https://doi.org/10.1007/s11227-019-02858-xS654265737510Awasthi M, Sudan K, Balasubramonian R, Carter J (2009) Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp 250–261. https://doi.org/10.1109/HPCA.2009.4798260Baer J, Low D, Crowley P, Sidhwaney N (2003) Memory hierarchy design for a multiprocessor look-up engine. In: 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003)Bahirat S, Pasricha S (2014) Meteor: hybrid photonic ring-mesh network-on-chip for multicore architectures. ACM Trans Embed Comput Syst 13(3s):116:1–116:33. https://doi.org/10.1145/2567940Bartolini S, Grani P (2012) A simple on-chip optical interconnection for improving performance of coherency traffic in CMPS. In: 15th Euromicro Conference on Digital System Design, pp 312–318. https://doi.org/10.1109/DSD.2012.13Beckmann BM, Marty MR, Wood DA (2006) ASR: adaptive selective replication for CMP caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39. IEEE Computer Society, Washington, DC, USA, pp 443–454. https://doi.org/10.1109/MICRO.2006.10Beckmann N, Sanchez D (2013) Jigsaw: scalable software-defined caches. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT ’13. IEEE Press, Piscataway, NJ, USA, pp 213–224. https://doi.org/10.1109/PACT.2013.6618818Bergman K, Carloni LP, Bibermani AC, Hendry G (2014) Photonic network-on-chip design, vol 68. Springer, New YorkChang J, Sohi GS (2006) Cooperative caching for chip multiprocessors. In: Proceedings 33rd Annual International Symposium on Computer Architecture, pp 264–276. https://doi.org/10.1109/ISCA.2006.17Chen G, Chen H, Haurylau M, Nelson N, Fauchet PM, Friedman EG, Albonesi D (2005) Predictions of CMOS compatible on-chip optical interconnect. In: Proceedings of International Workshop on System Level Interconnect Prediction, SLIP ’05, pp 13–20Chishti Z, Powell MD, Vijaykumar TN (2005) Optimizing replication, communication, and capacity allocation in cmps. SIGARCH Comput Archit News 33(2):357–368. https://doi.org/10.1145/1080695.1070001Cho S, Jin L (2006) Managing distributed, shared l2 caches through os-level page allocation. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06), pp 455–468. https://doi.org/10.1109/MICRO.2006.31Cianchetti MJ, Kerekes JC, Albonesi DH (2009) Phastlane: a rapid transit optical routing network. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA’09, pp 441–450. https://doi.org/10.1145/1555754.1555809Demir Y, Hardavellas N (2015) Parka: thermally insulated nanophotonic interconnects. In: NOCS ’15, pp 1:1–1:8. https://doi.org/10.1145/2786572.2786597Duan GH, Fedeli JM, Keyvaninia S, Thomson D (2012) 10 gb/s integrated tunable hybrid iii-v/si laser and silicon mach-zehnder modulator. In: European Conference and Exhibition on Optical Communication. https://doi.org/10.1364/ECEOC.2012.Tu.4.E.2Dybdahl H, Stenstrom P (2007) An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp 2–12. https://doi.org/10.1109/HPCA.2007.346180García A, Fernández R, Garca JM, Bartolini S (2014) Managing resources dynamically in hybrid photonic-electronic networks-on-chip. Concurr Comput Pract Exp 26(15):2530–2550. https://doi.org/10.1002/cpe.3332Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. SIGARCH Comput Archit News 37(3):184–195. https://doi.org/10.1145/1555815.1555779Herrero E, González J, Canal R (2008) Distributed cooperative caching. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, pp 134–143. https://doi.org/10.1145/1454115.1454136Herrero E, González J, Canal R (2010) Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp 419–428. https://doi.org/10.1145/1815961.1816018Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW (2005) A NUCA substrate for flexible CMP cache sharing. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS ’05. ACM, pp 31–40. https://doi.org/10.1145/1088149.1088154Kahng AB, Li B, Peh LS, Samadi K (2009) Orion 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: DATE. European Design and Automation Association, pp 423–428Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: Proceedings of the 28th Annual International Symposium on Computer Architecture, ISCA’01, pp 240–251Kim S, Chandra D, Solihin D (2004) Fair cache sharing and partitioning in a chip multiprocessor architecture. In: PACT, pp 111–122Merino J, Puente V, Gregorio JA (2010) ESP-NUCA: a low-cost adaptive non-uniform cache architecture. In: HPCA-16 2010 the Sixteenth International Symposium on High-performance Computer Architecture, pp 1–10. https://doi.org/10.1109/HPCA.2010.5416641Morris R, Kodi AK, Louri A (2012) Dynamic reconfiguration of 3d photonic networks-on-chip for maximizing performance and improving fault tolerance. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 282–293. https://doi.org/10.1109/MICRO.2012.34Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0: a tool to model large caches. In: HP LaboratoriesPang J, Dwyer C, Lebeck AR (2013) Exploiting emerging technologies for nanoscale photonic networks-on-chip. In: Proceedings of 6th International Workshop on NoC Architectures, NoCArc ’13, pp 53–58Petit S, Sahuquillo J, Such JM, Kaeli DR (2005) Exploiting temporal locality in drowsy cache policies. In: Proceedings of the Second Conference on Computing Frontiers, Ischia, Italy, 4–6 May 2005, pp 371–377Pons L, Selfa V, Sahuquillo J, Petit S, Pons J (2018) Improving system turnaround time with intel CAT by identifying LLC critical applications. In: Euro-Par 2018—Parallel Processing—24th International Conference on Parallel and Distributed Computing, Turin, Italy, 27–31 Aug 2018, Proceedings, pp 603–615. https://doi.org/10.1007/978-3-319-96983-1_43Qureshi M, Patt Y (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: MICRO, pp 423–432Rivers JA, Tam ES, Tyson GS, Davidson ES, Farrens MK (1998) Utilizing reuse information in data cache management. In: Proceedings of the 12th International Conference on Supercomputing, ICS 1998, Melbourne, Australia, 13–17 July 1998, pp 449–456. https://doi.org/10.1145/277830.277941Rosenfeld P, Cooper-Balis E, Jacob B (2011) Dramsim2: a cycle accurate memory system simulator. IEEE Comput Archit Lett 10:16–19. https://doi.org/10.1109/L-CA.2011.4Sahuquillo J, Pont A (1999) The filter cache: a run-time cache management approach1. In: 25th EUROMICRO ’99 Conference, Informatics: Theory and Practice for the New Millenium, 8–10 Sept 1999, Milan, Italy, pp 1424–1431. https://doi.org/10.1109/EURMIC.1999.794504Sahuquillo J, Pont A (2000) Splitting the data cache: a survey. IEEE Concurr 8(3):30–35. https://doi.org/10.1109/4434.865890Selfa V, Sahuquillo J, Eeckhout L, Petit S, Gómez ME (2017) Application clustering policies to address system fairness with intel’s cache allocation technology. In: 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017, Portland, OR, USA, 9–13 Sept 2017, pp 194–205. https://doi.org/10.1109/PACT.2017.19Shacham A, Bergman K, Carloni L (2007) On the design of a photonic network-on-chip. In: Networks-on-Chip, NOCS 2007, pp 53–64Soref R, Bennett B (1987) Electrooptical effects in silicon. IEEE J Quantum Electron 23(1):123–129. https://doi.org/10.1109/JQE.1987.1073206Henning JL (2006) SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News 34(4):1–17. https://doi.org/10.1145/1186736.1186737Tsai PA, Beckmann N, Sanchez D (2017) Jenga: software-defined cache hierarchies. SIGARCH Comput Archit News 45(2):652–665. https://doi.org/10.1145/3140659.3080214Ubal R, Sahuquillo J, Petit S, Lopez P (2007) Multi2sim: a simulation framework to evaluate multicore-multithreaded processors. In: International Symposium on Computer Architecture and High Performance Computing, pp 62–68. https://doi.org/10.1109/SBAC-PAD.2007.17Valero A, Sahuquillo J, Petit S, López P, Duato J (2012) Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans Archit Code Optim 9(3):16:1–16:20. https://doi.org/10.1145/2355585.2355589Vantrease D, Binkert N, Schreiber R, Lipasti M (2009) Light speed arbitration and flow control for nanophotonic interconnects. In: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium, pp 304–315Werner S, Navaridas J, Lujan M (2017) Designing low-power, low-latency networks-on-chip by optimally combining electrical and optical links. In: 2017 IEEE International Symposium of High Performance Computer Architectur

    A Temperature and Reliability Oriented Simulation Framework for Multi-core Architectures

    Get PDF
    The increasing complexity of multi-core architectures demands for a comprehensive evaluation of different solutions and alternatives at every stage of the design process, considering different aspects at the same time. Simulation frameworks are attractive tools to fulfil this requirement, due to their flexibility. Nevertheless, state-of-the-art simulation frameworks lack a joint analysis of power, performance, temperature profile and reliability projection at system-level, focusing only on a specific aspect. This paper presents a comprehensive estimation framework that jointly exploits these design metrics at system-level, considering processing cores, interconnect design and storage elements. We describe the framework in details, and provide a set of experiments that highlight its capability and flexibility, focusing on temperature and reliability analysis of multi-core architectures supported by Network-on-Chip interconnect

    Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects

    Get PDF
    Siirretty Doriast
    • …
    corecore