6 research outputs found
System-level Prototyping with HyperTransport
The complexity of computer systems continues to increase. Emulation of proposed subsystems is one way to manage this growing complexity when evaluating the performance of proposed architectures. HyperTransport allows researchers to connect directly to microprocessors with FPGAs. This enables the emulation of novel memory hierarchies, non-volatile memory designs, coprocessors, and other architectural changes, combined with an existing system
The BrainScaleS-2 Neuromorphic Platform — A Report on the Integration and Operation of an Open Science Hardware Platform within EBRAINS
This report presents the challenges encountered and the solutions created for the operation of the BrainScaleS neuromorphic platform, and the overall progress leading to this state at the end of the Human Brain Project (HBP)
Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011)
Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) which was held Feb. 9th 2011 in Mannheim, Germany. The Second International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)
Efficient protocols
The increasing demand for more and more computing power causes steady advancements of High Performance Computing (HPC) systems. The more powerful these systems will be in the future the further the number of processing units increases. A particularly important point in this context is the latency of the communication among those units, which significantly increases by the distance between two communication partners. One approach to positively influence the latency behavior is optimizing the underlying protocol structures in the overall system. Nowadays, different protocols are used for different communication distances. The latency can be improved by changing the protocol structure with two approaches. On the one hand, the used protocols can be changed to optimize the latency. On the other hand, the protocol structure can be unified. Thus, time-consuming protocol translations can be eliminated. In order to achieve this, a completely new protocol is required which unifies all features of the different protocol levels without compromising an efficient implementation.
This work is dedicated to the design of the new Unified Layer Protocol (ULP) providing a unified communication scheme which allows communication among all processing units at different levels of an HPC system. Initially, the main features of general protocols are analyzed in detail. Further, properties used by modern protocols use are introduced and their function is explained. The two protocols that are deemed most relevant, Hyper-Transport (HT) and Peripheral Component Interconnect Express (PCIe), are analyzed in detail regarding to the previously specified aspects. The insight gained through this analysis is incorporated into the development of the ULP. During the development process, first the structure of the ULP is defined and various parameters are determined. Special attention is turned on the feasibility in hardware and the scalability for large systems. The following comparison with HT and PCIe shows that the newly developed ULP usually provides superior performance, even when the effective communication distance moves close to the processor.
Further work is dedicated to the hardware development which first gave the inspiration for the development of the ULP. The insights gained during the development of the ULP were integrated into the hardware.
The results show that the ULP fulfills the demands for a protocol used in the field of HPC. This is achieved for both, the processor-near communication, as well as for the communication among different nodes. With the ULP the need for time and energy-consuming protocol conversions is eliminated, while the feasibility in hardware is obtained
Hardware Support for Efficient Packet Processing
Scalability is the key ingredient to further increase the performance of today’s supercomputers.
As other approaches like frequency scaling reach their limits, parallelization is the
only feasible way to further improve the performance. The time required for communication
needs to be kept as small as possible to increase the scalability, in order to be able to
further parallelize such systems.
In the first part of this thesis ways to reduce the inflicted latency in packet based interconnection
networks are analyzed and several new architectural solutions are proposed to
solve these issues. These solutions have been tested and proven in a field programmable
gate array (FPGA) environment. In addition, a hardware (HW) structure is presented that
enables low latency packet processing for financial markets.
The second part and the main contribution of this thesis is the newly designed crossbar
architecture. It introduces a novel way to integrate the ability to multicast in a crossbar
design. Furthermore, an efficient implementation of adaptive routing to reduce the
congestion vulnerability in packet based interconnection networks is shown. The low
latency of the design is demonstrated through simulation and its scalability is proven with
synthesis results.
The third part concentrates on the improvements and modifications made to EXTOLL, a
high performance interconnection network specifically designed for low latency and high
throughput applications. Contributions are modules enabling an efficient integration of
multiple host interfaces as well as the integration of the on-chip interconnect. Additionally,
some of the already existing functionality has been revised and improved to reach better
performance and a lower latency. Micro-benchmark results are presented to underline the
contribution of the made modifications
Efficient hardware for low latency applications
The design and development of application specific hardware structures has a
high degree of complexity. Logic resources are nowadays often not the limit
anymore, but the development time.
The first part presents a generator which allows defining control and status
structures for hardware designs using an abstract high level language.
A novel method to inform host systems very efficiently about changes in the
register files is presented in the second part. It makes use of a microcode
programmable hardware unit.
In the third part a fully pipelined address translation mechanism for remote
memory access in HPC interconnection networks is presented, which features a new
concept to resolve dependency problems.
The last part of this thesis addresses the problem of sending TCP messages for a
low latency trading application using a hybrid TCP stack implementation that
consists of hardware and software components. Furthermore, a simulation
environment for the TCP stack is presented