Search CORE

5,816 research outputs found

Parallel discrete event simulation: A shared memory approach

Author: Malony Allen D.
Mccredie Bradley D.
Reed Daniel A.
Publication venue
Publication date
Field of study

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models

NASA Technical Reports Server

OFAR-CM: Efficient Dragonfly networks with simple congestion management

Author: Beivide Palacio Ramon
García Marina
Rodríguez Germán
Valero Cortés Mateo
Vallejo Enrique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind of routing employs longer paths to circumvent local or global congested links. However, if a distance-based deadlock avoidance mechanism is employed, more Virtual Channels (VCs) are required, what increases design complexity and cost. OFAR (On-the-Fly Adaptive Routing) is a previously proposed routing that decouples VCs from deadlock avoidance, making local and global misrouting affordable. However, the severity of congestion with OFAR is higher, as it relies on an escape sub network with low bisection bandwidth. Additionally, OFAR allows for unlimited misroutings on the escape sub network, leading to unbounded paths in the network and long latencies. In this paper we propose and evaluate OFAR-CM, a variant of OFAR combined with a simple congestion management (CM) mechanism which only relies on local information, specifically the credit count of the output ports in the local router. With simple escape sub networks such as a Hamiltonian ring or a tree, OFAR outperforms former proposals with distance-based deadlock avoidance. Additionally, although long paths are allowed in theory, in practice packets arrive at their destination in a small number of hops. Altogether, OFAR-CM constitutes the first practicable mechanism to the date that supports both local and global misrouting in Dragonfly networks.The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253- RoMoL, the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European HiPEAC Network of Excellence. M. García participated in this work while affiliated with the University of Cantabria.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

To boldly go:an occam-π mission to engineer emergence

Author: Klein Mark
Sampson Adam T.
Wallnau Kurt
Welch Peter H.
Publication venue
Publication date: 01/01/2012
Field of study

Future systems will be too complex to design and implement explicitly. Instead, we will have to learn to engineer complex behaviours indirectly: through the discovery and application of local rules of behaviour, applied to simple process components, from which desired behaviours predictably emerge through dynamic interactions between massive numbers of instances. This paper describes a process-oriented architecture for fine-grained concurrent systems that enables experiments with such indirect engineering. Examples are presented showing the differing complex behaviours that can arise from minor (non-linear) adjustments to low-level parameters, the difficulties in suppressing the emergence of unwanted (bad) behaviour, the unexpected relationships between apparently unrelated physical phenomena (shown up by their separate emergence from the same primordial process swamp) and the ability to explore and engineer completely new physics (such as force fields) by their emergence from low-level process interactions whose mechanisms can only be imagined, but not built, at the current time

Abertay Research Portal

Crossref

Kent Academic Repository

FlexVC: Flexible virtual channel management in low-diameter networks

Author: Beivide Palacio Ramon
Fuentes Pablo
Minkenberg Cyriel
Valero Cortés Mateo
Vallejo Enrique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Deadlock avoidance mechanisms for lossless lowdistance networks typically increase the order of virtual channel (VC) index with each hop. This restricts the number of buffer resources depending on the routing mechanism and limits performance due to an inefficient use. Dynamic buffer organizations increase implementation complexity and only provide small gains in this context because a significant amount of buffering needs to be allocated statically to avoid congestion. We introduce FlexVC, a simple buffer management mechanism which permits a more flexible use of VCs. It combines statically partitioned buffers, opportunistic routing and a relaxed distancebased deadlock avoidance policy. FlexVC mitigates Head-of-Line blocking and reduces up to 50% the memory requirements. Simulation results in a Dragonfly network show congestion reduction and up to 37.8% throughput improvement, outperforming more complex dynamic approaches. FlexVC merges different flows of traffic in the same buffers, which in some cases makes more difficult to identify the traffic pattern in order to support nonminimal adaptive routing. An alternative denoted FlexVCminCred improves congestion sensing for adaptive routing by tracking separately packets routed minimally and nonminimally, rising throughput up to 20.4% with 25% savings in buffer area.This work has been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Economy, Industry and Competitiveness (contracts TIN2015-65316), the Spanish Research Agency (AEI/FEDER, UE - TIN2016-76635-C2-2-R), the Spanish Ministry of Education (FPU grant FPU13/00337), the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014- SGR-1272), the European Union FP7 programme (RoMoL ERC Advanced Grant GA 321253), the European HiPEAC Network of Excellence and the European Union’s Horizon 2020 research and innovation programme (Mont-Blanc project under grant agreement No 671697).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

An energy-efficient memory unit for clustered microarchitectures

Author: Bieschewski Stefan
González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC