929 research outputs found
Network unfairness in dragonfly topologies
Dragonfly networks arrange network routers in a two-level hierarchy, providing a competitive cost-performance solution for large systems. Non-minimal adaptive routing (adaptive misrouting) is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Network fairness issues arise in the dragonfly for several combinations of traffic pattern, global misrouting and traffic prioritization policy. Such unfairness prevents a balanced use of the resources across the network nodes and degrades severely the performance of any application running on an affected node. This paper reviews the main causes behind network unfairness in dragonflies, including a new adversarial traffic pattern which can easily occur in actual systems and congests all the global output links of a single router. A solution for the observed unfairness is evaluated using age-based arbitration. Results show that age-based arbitration mitigates fairness issues, especially when using in-transit adaptive routing. However, when using source adaptive routing, the saturation of the new traffic pattern interferes with the mechanisms employed to detect remote congestion, and the problem grows with the network size. This makes source adaptive routing in dragonflies based on remote notifications prone to reduced performance, even when using age-based arbitration.Peer ReviewedPostprint (author's final draft
OFAR-CM: Efficient Dragonfly networks with simple congestion management
Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind of routing employs longer paths to circumvent local or global congested links. However, if a distance-based deadlock avoidance mechanism is employed, more Virtual Channels (VCs) are required, what increases design complexity and cost. OFAR (On-the-Fly Adaptive Routing) is a previously proposed routing that decouples VCs from deadlock avoidance, making local and global misrouting affordable. However, the severity of congestion with OFAR is higher, as it relies on an escape sub network with low bisection bandwidth. Additionally, OFAR allows for unlimited misroutings on the escape sub network, leading to unbounded paths in the network and long latencies. In this paper we propose and evaluate OFAR-CM, a variant of OFAR combined with a simple congestion management (CM) mechanism which only relies on local information, specifically the credit count of the output ports in the local router. With simple escape sub networks such as a Hamiltonian ring or a tree, OFAR outperforms former proposals with distance-based deadlock avoidance. Additionally, although long paths are allowed in theory, in practice packets arrive at their destination in a small number of hops. Altogether, OFAR-CM constitutes the first practicable mechanism to the date that supports both local and global misrouting in Dragonfly networks.The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253-
RoMoL, the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European
HiPEAC Network of Excellence. M. García participated in this work while affiliated with the University of Cantabria.Peer ReviewedPostprint (author's final draft
Urban transport and mobility, towards a sustainable and competitive urban dynamic
El presente artículo pretende aportar algunas ideas sobre este tema en función de la interdependencia que existe entre los usos del suelo y el transporte, considerando al primero como sinónimo de actividades urbanas y al segundo como el medio de intercambio físico entre ellas. El discurso propone un análisis de la dinámica urbana con base en tres dimensiones: la formal, la funcional y la moral, procurando asociarlas a los paradigmas de la sustentabilidad y la competitividad urbanas, tratando de relacionarlas con los procesos de globalización para satisfacer las necesidades de movilidad en la ciudad posmoderna. Finalmente, se mencionan las características que los servicios de transporte público urbano de pasajeros deberían observar para contribuir a los objetivos funcionales y ambientales en los procesos de administración de la ciudad.A lo largo de la historia el hombre ha buscado formas diversas de transporte para realizar sus desplazamientos de manera más eficiente y confortable; en tal sentido, la evolución de los medios de transporte y la infraestructura física sobre la que discurren juegan un papel fundamental en la eficiencia y la eficacia de la movilidad de la población y sus bienes. El transporte urbano es un componente de la dimensión funcional de la ciudad y, junto con los usos del suelo, condiciona la manera en que se llevan cabo las actividades urbanas. En su conjunto, este fenómeno es conocido como “dinámica urbana”
Towards fair, scalable, locking
Without care, Hardware Transactional Memory presents several
performance pathologies that can degrade its performance. Among
them, writers of commonly read variables can suffer from starvation. Though different solutions have been proposed for HTM systems, hybrid systems can still suffer from this performance problem, given that software transactions don’t interact with the mechanisms used by hardware to avoid starvation.
In this paper we introduce a new per-directory-line hardware contention management mechanism that allows fairer access between both software and hardware threads without the need to abort any transaction. Our mechanism is based on “reserving” directory lines, implementing a limited fair queue for the requests on that line. We adapt the mechanism to the LogTM conflict detection mechanism and show that the resulting proposal is deadlock free. Finally, we sketch how the idea could be applied more generally to reader-writer locks.Postprint (published version
Implicit transactional memory in chip multiprocessors
Chip Multiprocessors (CMPs) are an efficient way of designing and use the huge amount of transistors on a chip. Different cores on a chip can compose a shared memory system with a very low-latency interconnect at a very low cost. Unfortunately, consistency models and synchronization styles of popular programming models for multiprocessors impose severe performance losses. Known architectural approaches to combat these losses are too complex, too specialized, or not transparent to the software. In this article, we introduce “implicit transactional memory” as a generalized architectural concept to remove such performance losses. We show how the concept of implicit transactions can be implemented at a low complexity by leveraging the multi-checkpoint mechanism of the Kilo-Instruction Processor. By relying on a general speculation substrate, it supports even the strictest consistency model – sequential consistency – potentially as effectively as weaker models and it allows multiple threads to speculatively execute critical sections, beyond barriers and event synchronizations.Postprint (published version
Efficient routing mechanisms for Dragonfly networks
High-radix hierarchical networks are cost-effective topologies for large scale computers. In such networks, routers are organized in super nodes, with local and global interconnections. These networks, known as Dragonflies, outperform traditional topologies such as multi-trees or tori, in cost and scalability. However, depending on the traffic pattern, network congestion can lead to degraded performance. Misrouting (non-minimal routing) can be employed to avoid saturated global or local links. Nevertheless, with the current deadlock avoidance mechanisms used for these networks, supporting misrouting implies routers with a larger number of virtual channels. This exacerbates the buffer memory requirements that constitute one of the main constraints in high-radix switches. In this paper we introduce two novel deadlock-free routing mechanisms for Dragonfly networks that support on-the-fly adaptive routing. Using these schemes both global and local misrouting are allowed employing the same number of virtual channels as in previous proposals. Opportunistic Local Misrouting obtains the best performance by providing the highest routing freedom, and relying on a deadlock-free escape path to the destination for every packet. However, it requires Virtual Cut-Through flow-control. By contrast, Restricted Local Misrouting prevents the appearance of cycles thanks to a restriction of the possible routes within super nodes. This makes this mechanism suitable for both Virtual Cut-Through and Wormhole networks. Evaluations show that the proposed deadlock-free routing mechanisms prevent the most frequent pathological issues of Dragonfly networks. As a result, they provide higher performance than previous schemes, while requiring the same area devoted to router buffers.This work has been supported by the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European HiPEAC Network of Excellence. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253-RoMoL. M. Garc´ıa and M. Odriozola participated
in this research work while they were affiliated
with the University of Cantabria.Peer ReviewedPostprint (author's final draft
Architectural support for task dependence management with flexible software scheduling
The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its programmability, portability and potential for optimizations. However, with the expected increase in core counts, finer-grained tasking will be required to exploit the available parallelism, which will increase the overheads introduced by the runtime system. This work presents Task Dependence Manager (TDM), a hardware/software co-designed mechanism to mitigate runtime system overheads. TDM introduces a hardware unit, denoted Dependence Management Unit (DMU), and minimal ISA extensions that allow the runtime system to offload costly dependence tracking operations to the DMU and to still perform task scheduling in software. With lower hardware cost, TDM outperforms hardware-based solutions and enhances the flexibility, adaptability and composability of the system. Results show that TDM improves performance by 12.3% and reduces EDP by 20.4% on average with respect to a software runtime system. Compared to a runtime system fully implemented in hardware, TDM achieves an average speedup of 4.2% with 7.3x less area requirements and significant EDP reductions. In addition, five different software schedulers are evaluated with TDM, illustrating its flexibility and performance gains.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and
Innovation (contracts TIN2015-65316-P, TIN2016-76635-C2-2-R and TIN2016-81840-REDT), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671697 and No. 671610. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047.Peer ReviewedPostprint (author's final draft
FlexVC: Flexible virtual channel management in low-diameter networks
Deadlock avoidance mechanisms for lossless lowdistance networks typically increase the order of virtual channel (VC) index with each hop. This restricts the number of buffer resources depending on the routing mechanism and limits performance due to an inefficient use. Dynamic buffer organizations increase implementation complexity and only provide small gains in this context because a significant amount of buffering needs to be allocated statically to avoid congestion. We introduce FlexVC, a simple buffer management mechanism which permits a more flexible use of VCs. It combines statically partitioned buffers, opportunistic routing and a relaxed distancebased deadlock avoidance policy. FlexVC mitigates Head-of-Line blocking and reduces up to 50% the memory requirements. Simulation results in a Dragonfly network show congestion reduction and up to 37.8% throughput improvement, outperforming more complex dynamic approaches. FlexVC merges different flows of traffic in the same buffers, which in some cases makes more difficult to identify the traffic pattern in order to support nonminimal adaptive routing. An alternative denoted FlexVCminCred improves congestion sensing for adaptive routing by tracking separately packets routed minimally and nonminimally, rising throughput up to 20.4% with 25% savings in buffer area.This work has been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Economy, Industry and Competitiveness
(contracts TIN2015-65316), the Spanish Research Agency (AEI/FEDER, UE - TIN2016-76635-C2-2-R), the Spanish
Ministry of Education (FPU grant FPU13/00337), the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-
SGR-1272), the European Union FP7 programme (RoMoL ERC Advanced Grant GA 321253), the European HiPEAC Network of Excellence and the European Union’s Horizon
2020 research and innovation programme (Mont-Blanc project under grant agreement No 671697).Peer ReviewedPostprint (author's final draft
Transporte y movilidad en el marco de la sustentabilidad y competitividad de la ciudad posmoderna
La dinámica urbana depende de tres tipos de factores: físicos, funcionales y morales. Cada uno de ellos representa una dimensión de análisis de la ciudad. El transporte urbano en es un componente de la dimensión funcional, el comportamiento de los individuos se relaciona con la dimensión moral y las características locacionales de los usos del suelo y la infraestructura se asocian con la dimensión física. El propósito del presente trabajo es exponer las características y el papel del transporte urbano de pasajeros en la movilidad de la ciudad posmoderna en el marco de los discursos sobre la sustentabilidad y la competitividad urbanas bajo el análisis de los factores físicos, funcionales y morales mencionados anteriormente.La dinámica urbana depende de tres tipos de factores: físicos, funcionales y morales. Cada uno de ellos representa una dimensión de análisis de la ciudad. El transporte urbano en es un componente de la dimensión funcional, el comportamiento de los individuos se relaciona con la dimensión moral y las características locacionales de los usos del suelo y la infraestructura se asocian con la dimensión física. El propósito del presente trabajo es exponer las características y el papel del transporte urbano de pasajeros en la movilidad de la ciudad posmoderna en el marco de los discursos sobre la sustentabilidad y la competitividad urbanas bajo el análisis de los factores físicos, funcionales y morales mencionados anteriormente
- …