5 research outputs found
FlexVC: Flexible virtual channel management in low-diameter networks
Deadlock avoidance mechanisms for lossless lowdistance networks typically increase the order of virtual channel (VC) index with each hop. This restricts the number of buffer resources depending on the routing mechanism and limits performance due to an inefficient use. Dynamic buffer organizations increase implementation complexity and only provide small gains in this context because a significant amount of buffering needs to be allocated statically to avoid congestion. We introduce FlexVC, a simple buffer management mechanism which permits a more flexible use of VCs. It combines statically partitioned buffers, opportunistic routing and a relaxed distancebased deadlock avoidance policy. FlexVC mitigates Head-of-Line blocking and reduces up to 50% the memory requirements. Simulation results in a Dragonfly network show congestion reduction and up to 37.8% throughput improvement, outperforming more complex dynamic approaches. FlexVC merges different flows of traffic in the same buffers, which in some cases makes more difficult to identify the traffic pattern in order to support nonminimal adaptive routing. An alternative denoted FlexVCminCred improves congestion sensing for adaptive routing by tracking separately packets routed minimally and nonminimally, rising throughput up to 20.4% with 25% savings in buffer area.This work has been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Economy, Industry and Competitiveness
(contracts TIN2015-65316), the Spanish Research Agency (AEI/FEDER, UE - TIN2016-76635-C2-2-R), the Spanish
Ministry of Education (FPU grant FPU13/00337), the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-
SGR-1272), the European Union FP7 programme (RoMoL ERC Advanced Grant GA 321253), the European HiPEAC Network of Excellence and the European Union’s Horizon
2020 research and innovation programme (Mont-Blanc project under grant agreement No 671697).Peer ReviewedPostprint (author's final draft
Hardware Support for Efficient Packet Processing
Scalability is the key ingredient to further increase the performance of today’s supercomputers.
As other approaches like frequency scaling reach their limits, parallelization is the
only feasible way to further improve the performance. The time required for communication
needs to be kept as small as possible to increase the scalability, in order to be able to
further parallelize such systems.
In the first part of this thesis ways to reduce the inflicted latency in packet based interconnection
networks are analyzed and several new architectural solutions are proposed to
solve these issues. These solutions have been tested and proven in a field programmable
gate array (FPGA) environment. In addition, a hardware (HW) structure is presented that
enables low latency packet processing for financial markets.
The second part and the main contribution of this thesis is the newly designed crossbar
architecture. It introduces a novel way to integrate the ability to multicast in a crossbar
design. Furthermore, an efficient implementation of adaptive routing to reduce the
congestion vulnerability in packet based interconnection networks is shown. The low
latency of the design is demonstrated through simulation and its scalability is proven with
synthesis results.
The third part concentrates on the improvements and modifications made to EXTOLL, a
high performance interconnection network specifically designed for low latency and high
throughput applications. Contributions are modules enabling an efficient integration of
multiple host interfaces as well as the integration of the on-chip interconnect. Additionally,
some of the already existing functionality has been revised and improved to reach better
performance and a lower latency. Micro-benchmark results are presented to underline the
contribution of the made modifications