1,065 research outputs found

    The impact of global communication latency at extreme scales on Krylov methods

    Get PDF
    Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths

    Evaluation of temperature-performance trade-offs in wireless network-on-chip architectures

    Get PDF
    Continued scaling of device geometries according to Moore\u27s Law is enabling complete end-user systems on a single chip. Massive multicore processors are enablers for many information and communication technology (ICT) innovations spanning various domains, including healthcare, defense, and entertainment. In the design of high-performance massive multicore chips, power and heat are dominant constraints. Temperature hotspots witnessed in multicore systems exacerbate the problem of reliability in deep submicron technologies. Hence, there is a great need to explore holistic power and thermal optimization and management strategies for the massive multicore chips. High power consumption not only raises chip temperature and cooling cost, but also decreases chip reliability and performance. Thus, addressing thermal concerns at different stages of the design and operation is critical to the success of future generation systems. The performance of a multicore chip is also influenced by its overall communication infrastructure, which is predominantly a Network-on-Chip (NoC). The existing method of implementing a NoC with planar metal interconnects is deficient due to high latency, significant power consumption, and temperature hotspots arising out of long, multi-hop wireline links used in data exchange. On-chip wireless networks are envisioned as an enabling technology to design low power and high bandwidth massive multicore architectures. However, optimizing wireless NoCs for best performance does not necessarily guarantee a thermally optimal interconnection architecture. The wireless links being highly efficient attract very high traffic densities which in turn results in temperature hotspots. Therefore, while the wireless links result in better performance and energy-efficiency, they can also cause temperature hotspots and undermine the reliability of the system. Consequently, the location and utilization of the wireless links is an important factor in thermal optimization of high performance wireless Networks-on-Chip. Architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance yet energy-efficient massive multicore chips. This work contributes to exploration of various the design methodologies for establishing wireless NoC architectures that achieve the best trade-offs between temperature, performance and energy-efficiency. It further demonstrates that incorporating Dynamic Thermal Management (DTM) on a multicore chip designed with such temperature and performance optimized Wireless Network-on-Chip architectures improves thermal profile while simultaneously providing lower latency and reduced network energy dissipation compared to its conventional counterparts

    On random wiring in practicable folded clos networks for modern datacenters

    Get PDF
    Big scale, high performance and fault-tolerance, low-cost and graceful expandability are pursued features in current datacenter networks (DCN). Although there have been many proposals for DCNs, most modern installations are equipped with classical folded Clos networks. Recently, regular random topologies, as the Jellyfish, have been proposed for DCNs. However, their completely unstructured nature entails serious design problems. In this paper we propose Random Folded Clos (RFC) and Hydra networks in which the interconnection between certain switches levels is made randomly. Both RFCs and Hydras preserve important properties of Clos networks that provide a straightforward deadlock-free multi-path routing. The proposed networks leverage randomness to be gracefully expandable, thereby allowing for fine grain upgrading. RFCs and Hydras are compared in the paper, in topological and cost terms, against fat-trees, orthogonal fat-trees and random regular networks. Also, experiments are carried out to simulate their performance under synthetic traffic patterns emulating common loads present in warehouse scale computers. These theoretical and empirical studies reveal the interest of these topologies, concluding that Hydra constitutes a practicable alternative to current datacenter networks since it appropriately balance all the main design requirements. Moreover, Hydras perform better than the fat-trees, their natural competitor, being able to connect the same or more computing nodes with significant lower cost and latency while exhibiting comparable throughput. © 1990-2012 IEEE

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci
    • …
    corecore