1,944 research outputs found
APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters
Many scientific computations need multi-node parallelism for matching up both
space (memory) and time (speed) ever-increasing requirements. The use of GPUs
as accelerators introduces yet another level of complexity for the programmer
and may potentially result in large overheads due to the complex memory
hierarchy. Additionally, top-notch problems may easily employ more than a
Petaflops of sustained computing power, requiring thousands of GPUs
orchestrated with some parallel programming model. Here we describe APEnet+,
the new generation of our interconnect, which scales up to tens of thousands of
nodes with linear cost, thus improving the price/performance ratio on large
clusters. The project target is the development of the Apelink+ host adapter
featuring a low latency, high bandwidth direct network, state-of-the-art wire
speeds on the links and a PCIe X8 gen2 host interface. It features hardware
support for the RDMA programming model and experimental acceleration of GPU
networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI
library driver are available, allowing for painless porting of standard
applications. Finally, we give an insight of future work and intended
developments
DFT and BIST of a multichip module for high-energy physics experiments
Engineers at Politecnico di Torino designed a multichip module for high-energy physics experiments conducted on the Large Hadron Collider. An array of these MCMs handles multichannel data acquisition and signal processing. Testing the MCM from board to die level required a combination of DFT strategie
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
From FPGA to ASIC: A RISC-V processor experience
This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC
A global wire planning scheme for Network-on-Chip.
As technology scales down, the interconnect for on-chip global communication becomes the delay bottleneck. In order to provide well-controlled global wire delay and efficient global communication, a packet switched Network-on-Chip (NoC) architecture was proposed by different authors. In this paper, the NoC system parameters constrained by the interconnections are studied. Predictions on scaled system parameters such as clock frequency, resource size, global communication bandwidth and inter-resource delay are made for future technologies. Based on these parameters, a global wire planning scheme is proposed
Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments
[EN] In this work, we adapt a reconfigurable computer system based on FPGA
technologies to OpenCL programming environments. The reconfigurable system
is part of a compute prototype of the MANGO European project that includes 96
FPGAs. To optimize the use and to obtain its maximum performance, it is essential to adapt it to heterogeneous systems programming environments such as
OpenCL, which simplifies its programming. In this work, all the necessary activities for correct implementation of the software and hardware layer required for
its use in OpenCL will be carried out, as well as an evaluation of the performance
obtained and the flexibility offered by the solution provided.
This work has been performed during an internship of 5 months. The internship is linked to an agreement between UPV and UniNa (Università degli Studi
di Napoli Federico II).[ES] En este trabajo se va a realizar la adaptación de un sistema reconfigurable de
cómputo basado en tecnologías de FPGAs hacia entornos de programación en
OpenCL. El sistema reconfigurable forma parte de un prototipo de cálculo del
proyecto Europeo MANGO que incluye 96 FPGAs. Con el fin de optimizar el
uso y de obtener sus máximas prestaciones, se hace imprescindible una adaptación a entornos de programación de sistemas heterogéneos como OpenCL, lo cual
simplifica su programación y uso. En este trabajo se realizarán todas las actividades necesarias para una correcta implementación de la capa software y hardware
necesaria para su uso en OpenCL así como una evaluación de las prestaciones
obtenidas y de la flexibilidad ofrecida por la solución aportada.
Este trabajo se ha llevado a término durante una estancia de cinco meses en
la Universitat Politécnica de Valéncia. Esta estancia está vinculada a un acuerdo
entre la Universitat Politécnica de Valéncia y la Università degli Studi di Napoli
Federico IIRusso, D. (2020). Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments. http://hdl.handle.net/10251/150393TFG
Implementing High-Speed String Matching Hardware for Network Intrusion Detection Systems
This paper presents high-throughput techniques for implementing FSM based string matching hardware on FPGAs. By taking advantage of the fact that string matching operations for different packets are independent, a novel multi-threading FSM design is presented, which dramatically increases the FSM frequency and the throughput of string matching operations. In addition, design techniques for high-speed interconnect and interface circuits for the proposed FSM are also presented. Experimental results conducted on FPGA platforms are presented to study the effectiveness of the proposed techniques and explore the trade-offs between system performance, strings partition granularity and hardware resource cost
Pcie Ip Validation Process Across Process Corner, Voltage And Temperature Conditions
IP validation has become more challenging for FPGA device as it supports high operating speed. The Peripheral Component Interconnect Express (PCIe) is an IP used for high speed data transfer that supported by Intel FPGAs. The base specifications of PCIe 3.0 supports 8.0 GT/s, 5.0 GT/s and 2.5 GT/s. The link training and Initialization takes place at physical layer to initialize the link width and link data rate. The physical layer is getting more complex when it supports higher speed. The operational state only happens when Link Training and Status State Machine (LTSSM) reaches L0 state after device being configured. The stability of link training is improved by optimizing the soft logic design in application layer. Two protocol tests usually validated in industry are link up testing and link & higher layer testing. Debugging tools supported by Quartus are fully utilized to detect any failure during link training. The characterization of link performance covers process corners, voltage and temperature conditions are hard to analyze. By using hypothesis testing method, data collected gives a clear trend on the PCIe link performance. The H0 statement shows a significant difference for passing and failing case. In this research, the worst case happened at low voltage and low temperature regardless of any process corners. The p-value is greater than 0.05 proved H0 statement is accepted. The difference on passing and failing percentage is insignificantly impacting overall link performance of PCIe. It concludes that the bug is random and not caused by any defects on the silicon layout of FPGA device. Thus, IP validation shows the robustness of the device and able to comply with base specification of PCIe
- …