10 research outputs found

    Study and Optimization of Particle Track Detection via Hough Transform Hardware Implementation for the ATLAS Phase-II Trigger Upgrade

    Get PDF
    In the CERN of Geneva the Large Hadron Collider (LHC) will undergo several deep upgrades in the next years. Instantaneous and Integrated Luminosity will be increased respectively up to 5−7·10 34 cm −2 s −1 and 3000 f b −1 . Alongside this collider the experiments exploiting LHC will undergo through upgrades crucial to fulfill the HEP goals. The ATLAS upgrades are divided into phases, namely Phase-I and Phase-II. Part of the ATLAS upgrade concerns the Trigger and Data Acquisition systems. In particular, for the ATLAS trigger, a big technological update is planned for the Phase-II. My contribution to these Phase-I and Phase-II plans has been focused to the Trigger and Data Acquisition system electronic update. In the Phase-I upgrade I worked at the commissioning of the new FELIX readout cards FLX-712 which will be mounted on part of the TDAQ system. These cards are FPGA based with a bandwidth up to 480 Gb/s and exploit PCI Express Generation 3 technology. My work has been focused on the preparation and the follow up of part of the tests of the cards for quality checks and controls. The ATLAS Phase-II trigger targets to increase its output data stream to the Tier 0 of one order of magnitude. For the ATLAS Phase-II upgrade I developed an implementation of a tracking algorithm to fulfill the new trigger requirements. This algorithm, known as Hough Transform, is used to track particle trajectories and it has been already demonstrated to be suited for the ATLAS specifications. In this thesis I present the study, the simulations and the hardware implementation of a preliminary version of the Hough Transform algorithm on a XILINX Ultrascale+ FPGA device

    A high speed serializer/deserializer design

    Get PDF
    A Serializer/Deserializer (SerDes) is a circuit that converts parallel data into a serial stream and vice versa. It helps solve clock/data skew problems, simplifies data transmission, lowers the power consumption and reduces the chip cost. The goal of this project was to solve the challenges in high speed SerDes design, which included the low jitter design, wide bandwidth design and low power design. A quarter-rate multiplexer/demultiplexer (MUX/DEMUX) was implemented. This quarter-rate structure decreases the required clock frequency from one half to one quarter of the data rate. It is shown that this significantly relaxes the design of the VCO at high speed and achieves lower power consumption. A novel multi-phase LC-ring oscillator was developed to supply a low noise clock to the SerDes. This proposed VCO combined an LC-tank with a ring structure to achieve both wide tuning range (11%) and low phase noise (-110dBc/Hz at 1MHz offset). With this structure, a data rate of 36 Gb/s was realized with a measured peak-to-peak jitter of 10ps using 0.18microm SiGe BiCMOS technology. The power consumption is 3.6W with 3.4V power supply voltage. At a 60 Gb/s data rate the simulated peak-to-peak jitter was 4.8ps using 65nm CMOS technology. The power consumption is 92mW with 2V power supply voltage. A time-to-digital (TDC) calibration circuit was designed to compensate for the phase mismatches among the multiple phases of the PLL clock using a three dimensional fully depleted silicon on insulator (3D FDSOI) CMOS process. The 3D process separated the analog PLL portion from the digital calibration portion into different tiers. This eliminated the noise coupling through the common substrate in the 2D process. Mismatches caused by the vertical tier-to-tier interconnections and the temperature influence in the 3D process were attenuated by the proposed calibration circuit. The design strategy and circuits developed from this dissertation provide significant benefit to both wired and wireless applications

    Von Neumann bottlenecks in non-von Neumann computing architectures

    Get PDF
    The term "neuromorphic" refers to a broad class of computational devices that mimic various aspects of cortical information processing. In particular, they instantiate neurons, either physically or virtually, which communicate through time-singular events called spikes. This thesis presents a generic RTL implementation of a Point-to-Point chip interconnect protocol that is well-suited to accommodate the unique I/O requirements associated with event-based communication, especially in the case of accelerated mixed-signal neuromorphic devices. A physical realization of such an interconnect was implemented on the most recent version of the BrainScaleS-2 architecture---the HICANN-X system---to facilitate a high-speed bi-directional connection to a host FPGA. Event rates of up to 250MHz full-duplex as well as several stream-secured configuration and memory interface channels are transported via 8*1Gbit/s LVDS DDR serializers. As the presented approach is entirely independent of the serializer implementation, it has applications beyond neuromorphic computing, such as enabling the separation of concerns and aiding the development of serializer-independent protocol bridges for system design

    ATLAS Pixel Detector and readout upgrades for the improved LHC performance

    Get PDF
    Since the moment it was first started in 2008, the LHC particle accelerator at CERN continued to constantly increase its center-of-mass energy and luminosity. During next years, LHC will undergo two major series of upgrades; after the first one it will reach the design energy of 14 TeV and a luminosity of 2-3·10^34cm-2s-1 (Phase-I ), and in the last phase (Phase-II ) the luminosity will be increased to ~7·10^34cm-2 s-1 . To keep up with the augmented detector performance, the LHC detectors where (and will be) upgraded as well. This work will focus on the ATLAS detector - one of the four main experiments of LHC - and in particular on its Pixel Detector. The ATLAS Pixel Detector was first upgraded in 2015, with the introduction of a new pixel layer - called IBL - to compensate for the B-layer inefficiencies and to increase the tracking performance for Phase-0 and Phase-I. The detector layout, combined with the higher LHC luminosity, led to an increased amount of data to be transmitted and analyzed, constituting a challenge for the read-out system. For this reason the previous readout chain was renovated and two new boards, called IBL-ROD and IBL-BOC, were designed to interface IBL. The second major upgrade involving the ATLAS Pixel Detector will be in 2024-2026, when the Inner Detector will be completely replaced by ITk, entirely made of silicon sensors. To be able to sustain the more difficult conditions, another readout upgrade will be required; the final design has not been decided yet and is still under consideration. This work will give an overview on the ATLAS Pixel Detector and will analyze the motivations that led to its upgrades. The current and future DAQ systems will also be discussed, focusing on the technologies adopted, the detector requirements and the results obtained

    Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems

    Full text link
    The demand for processing power is increasing steadily. In the past, single processor architectures clearly dominated the markets. As instruction level parallelism is limited in most applications, significant performance can only be achieved in the future by exploiting parallelism at the higher levels of thread or process parallelism. As a consequence, modern “processors” incorporate multiple processor cores that form a single shared memory multiprocessor. In such systems, high performance devices like network interface controllers are connected to processors and memory like every other input/output device over a hierarchy of peripheral interconnects. Thus, one target must be to couple coprocessors physically closer to main memory and to the processors of a computing node. This removes the overhead of today’s peripheral interconnect structures. Such a step is the direct connection of HyperTransport (HT) devices to Opteron processors, which is presented in this thesis. Also, this work analyzes how communication from a device to processors can be optimized on the protocol level. As today’s computing nodes are shared memory systems, the cache coherence protocol is the central protocol for data exchange between processors and devices. Consequently, the analysis extends to classes of devices that are cache coherence protocol aware. Also, the concept of a transfer cache is proposed in this thesis, which reduces latency significantly even for non-coherent devices. The trend to the exploitation of process and thread level parallelism leads to a steady increase of system sizes. Networks that are used in such large systems are very susceptible to both hard and transient faults. Most transient fault rates are constant per bit that is stored or transmitted. With increasing system sizes and higher clock frequencies, the number of faults in time increases drastically. In the end, the error rate may rise at a level where high level error recovery becomes too costly if lower layers do not perform error correction that is transparent to the layers above. The second part of this thesis describes a direct interconnection network that provides a reliable transport service even without the use of end-to-end protocols. Also, a novel hardware based solution for intermediate routing is developed in this thesis, which allows an efficient, deadlock free routing around faulty links

    A 64b/66b line encoding for high speed serializers

    No full text
    by Satyajit Mohapatra, Hari Shanker Gupta, Jatindeep Singh and Nihar Ranjan Mohapatr

    Efficient protocols

    Full text link
    The increasing demand for more and more computing power causes steady advancements of High Performance Computing (HPC) systems. The more powerful these systems will be in the future the further the number of processing units increases. A particularly important point in this context is the latency of the communication among those units, which significantly increases by the distance between two communication partners. One approach to positively influence the latency behavior is optimizing the underlying protocol structures in the overall system. Nowadays, different protocols are used for different communication distances. The latency can be improved by changing the protocol structure with two approaches. On the one hand, the used protocols can be changed to optimize the latency. On the other hand, the protocol structure can be unified. Thus, time-consuming protocol translations can be eliminated. In order to achieve this, a completely new protocol is required which unifies all features of the different protocol levels without compromising an efficient implementation. This work is dedicated to the design of the new Unified Layer Protocol (ULP) providing a unified communication scheme which allows communication among all processing units at different levels of an HPC system. Initially, the main features of general protocols are analyzed in detail. Further, properties used by modern protocols use are introduced and their function is explained. The two protocols that are deemed most relevant, Hyper-Transport (HT) and Peripheral Component Interconnect Express (PCIe), are analyzed in detail regarding to the previously specified aspects. The insight gained through this analysis is incorporated into the development of the ULP. During the development process, first the structure of the ULP is defined and various parameters are determined. Special attention is turned on the feasibility in hardware and the scalability for large systems. The following comparison with HT and PCIe shows that the newly developed ULP usually provides superior performance, even when the effective communication distance moves close to the processor. Further work is dedicated to the hardware development which first gave the inspiration for the development of the ULP. The insights gained during the development of the ULP were integrated into the hardware. The results show that the ULP fulfills the demands for a protocol used in the field of HPC. This is achieved for both, the processor-near communication, as well as for the communication among different nodes. With the ULP the need for time and energy-consuming protocol conversions is eliminated, while the feasibility in hardware is obtained

    Research and design of high-speed advanced analogue front-ends for fibre-optic transmission systems

    Get PDF
    In the last decade, we have witnessed the emergence of large, warehouse-scale data centres which have enabled new internet-based software applications such as cloud computing, search engines, social media, e-government etc. Such data centres consist of large collections of servers interconnected using short-reach (reach up to a few hundred meters) optical interconnect. Today, transceivers for these applications achieve up to 100Gb/s by multiplexing 10x 10Gb/s or 4x 25Gb/s channels. In the near future however, data centre operators have expressed a need for optical links which can support 400Gb/s up to 1Tb/s. The crucial challenge is to achieve this in the same footprint (same transceiver module) and with similar power consumption as today’s technology. Straightforward scaling of the currently used space or wavelength division multiplexing may be difficult to achieve: indeed a 1Tb/s transceiver would require integration of 40 VCSELs (vertical cavity surface emitting laser diode, widely used for short‐reach optical interconnect), 40 photodiodes and the electronics operating at 25Gb/s in the same module as today’s 100Gb/s transceiver. Pushing the bit rate on such links beyond today’s commercially available 100Gb/s/fibre will require new generations of VCSELs and their driver and receiver electronics. This work looks into a number of state‐of-the-art technologies and investigates their performance restraints and recommends different set of designs, specifically targeting multilevel modulation formats. Several methods to extend the bandwidth using deep submicron (65nm and 28nm) CMOS technology are explored in this work, while also maintaining a focus upon reducing power consumption and chip area. The techniques used were pre-emphasis in rising and falling edges of the signal and bandwidth extensions by inductive peaking and different local feedback techniques. These techniques have been applied to a transmitter and receiver developed for advanced modulation formats such as PAM-4 (4 level pulse amplitude modulation). Such modulation format can increase the throughput per individual channel, which helps to overcome the challenges mentioned above to realize 400Gb/s to 1Tb/s transceivers

    Topical Workshop on Electronics for Particle Physics

    Get PDF
    The purpose of the workshop was to present results and original concepts for electronics research and development relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities; to review the status of electronics for the LHC experiments; to identify and encourage common efforts for the development of electronics; and to promote information exchange and collaboration in the relevant engineering and physics communities
    corecore