14 research outputs found

    Development of FPGA-based High-Speed serial links for High Energy Physics Experiments

    Get PDF
    High Energy Physics (HEP) experiments generate high volumes of data which need to be transferred over long distance. Then, for data read out, reliable and high-speed links are necessary. Over the years, due to their extreme high bandwidth, serial links (especially optical) have been preferred over the parallel ones. So that, now, high-speed serial links are commonly used in Trigger and Data Acquisition (TDAQ) systems of HEP experiments, not only for data transfer, but also for the distribution of trigger and control systems. Examples of their wide use can be found at CERN, where each of the four big experiments mounted on the Large Hadron Collider (LHC) uses a huge amount of serial links in its read out system. Again at LHC, the Timing, Trigger and Control system (TTC), which broadcasts the timing signals, from the LHC machine to the experiments, uses optical serial link to distribute signals over kilometers of distance (diameter of LHC is 27 Km). Also for upgrades of LHC, physical layer components and protocol chips (ASIC) have been designed and are now under development: the Versatile Link and the GBT protocol (and ASICs) whose peculiarity relies in their radiation hardness. This PhD project is intended to respond to the requests of HEP experiments, developing: - a high-speed self-adapting serial link, which can be easily used in different application fields; - the serial interface of a read out board in the end-cap region of ATLAS Experiment at LHC; - the interface board for the barrel read out system of the ATLAS Experiments. Both the two last projects have required the development of fixed latency, high-speed serial links. In order to take advantage of flexibility, re-programmability and system integration of SRAM-based Field Programmable Gate Array devices (FPGAs), their serializer-deserializer (SERDES) embedded modules have been chosen for the development of the links. However, as a drawback, FPGA embedded SERDESes are typically designed for applications that do not require a deterministic latenc. Then, an accurate study of their architecture has been necessary, in order to find a configuration and a clocking scheme to guarantee a deterministic transmission delay in data transfers. The frequency agile, auto-adaptive serial link is capable to analyze the incoming data stream, by scanning the Unit Interval, and to find the highest transmission line rate, according to a given tolerated Bit Error Ratio (BER). It uses a new feature (RX eye margin analysis) of the RX side of the Xilinx 7 series FPGAs high-speed transceivers (GTX/GTH), in order to measure and display the receiver eye margin after the equalizer. When the new eye scan functionality is running, an additional sampler is activated in the GTX. It acquires a new sample (Offset Sample), with programmable (horizontal and vertical) offsets from the data sample point (Data Sample) used in standard operation. An eye scan measurement run is performed by acquiring a large number of Data Samples (which can range from tens of thousands to 1014 or more) and by counting the number of times the Offset Sample has a different value with respect to the Data Sample; the latter number is often called Error Count. The BER at a specific vertical and horizontal offset is given by the ratio between the Error Count and the Sample Count. By repeating the eye scan measurement for each horizontal and vertical offset in the Unit Interval (or in a part of the U.I.) a 2-D BER map can be produced which is usually called Statistical Eye. The auto-adaptive derail ink is designed around an FPGA-embedded microprocessor, which drives the programmable ports of the GTX, in order to perform a 2-D eye-scan, and takes care of the reconfiguration of the GTX parameters, in order to fully benefit from the available link bandwidth. Xilinx provides a standalone tool that allows performing the Eye Scan Analysis on the receiver side of the GTX/GTH transceiver, using the MicroBlaze Micro Controller System macro; the toolkit also includes the Eye Scan algorithm (providing the C code). Moreover, Xilinx supplies the hardware sources files for the implementation of a link based on the XAUI protocol, in which the GTXs are arranged in a loopback configuration. The original contribution of this work consists in the build-up, design and optimization of a full architecture, on top of the basic Xilinx tool, which: - drives the programmable ports of the GTX in order to modify the line rate of the link; - runs consecutive eye scans for various line rate; - analyses the results of the different scans, in order to find the maximum line rate sustainable by the link; - manages the synchronization between the transmitter and the receiver of the link, that will be needed at each line rate change. The application can be deployed as a monitoring tool in HEP experiments, in order to remotely monitor a transmission system or detect issues in the serial link physical layer. An application example could be some of the many experiments at Large Hadron Collider (LHC) at CERN, which have been intensively using different serial links, both for transmission of TTC signals and for trigger and data readout. Besides, this solution could be easily adapted in wide, different frameworks, as it can be used on top of any user’s existing link, as it has no specific requirement about link specification or protocol. The other two serial interface developed in this project are in the framework of the ATLAS experiment. ATLAS is one of the four detectors installed on the LHC proton-proton collider built at CERN. It was designed to collide two opposing particle beams at an energy of 14 TeV and to reach a luminosity of 1034 cm-2/s. In order to reach the design parameters, the LHC system will be upgraded in several phases. In order to take advantage of the improved LHC operation, the ATLAS detector must be upgraded following the same schedule as the LHC upgrade. The main focus of the Phase-I ATLAS upgrade (to be completed by 2018) is on the Level-1 trigger where upgrades are planned for both the muon and the calorimeter trigger systems. In particular, for the end-cap region of the muon spectrometer, the installation of a new set of precision tracking and trigger detectors was approved, called the ‘New Small Wheels’ (NSW). It will be instrumented with micro-mesh gaseous structure detectors (MM) and small-strip Thin Gap Chambers (sTGC). These detectors will solve two points of particular importance at high luminosity: high rate of fake high-pt level-1 muon triggers, and high L1 muon rate with the current momentum threshold. With the introduction of new detectors, new electronics need to be developed, in particular new trigger electronics for both the MM and sTGC. I was involved in the development of serial interface of the FPGA-based sTGC trigger board that uses information from the coarse sTGC readout pads. The sTGC pad trigger board receives serial information coming from 24 front-end chips at 4.8 Gb/s. On the board, data are deserialised, aligned and analyzed by the trigger algorithm. The trigger logic processes the data and choses two candidates at each Bunch Crossing. The result is then serialised and used for selective fine-grained strip readout. I developed the pad trigger board interface logic. The data format from the front-end chips has been agreed upon, and defines the requirements on the receiver and decoding logic. The number of output lines is 24 and the data are 8B/10B formatted. While the receiver uses the Xilinx Kintex-7 GTX transceivers, the output lines are driven by double data rate (DDR) shift registers at 640 Mb/s. A fixed latency in the sTGC trigger chain was guaranteed through the implementation and configuration of all serialisers and deserialisers. In order to test the project, I also developed a simple microprocessor-based protocol for accessing the board via terminal (rs232). A demonstrator board is now being developed. Another Phase-I Level-1 trigger upgrade consists of a new Muon to Central Trigger Processor Interface (MUCTPI). The MUCTPI receives muon candidate information from each of the muon detectors, selects muon candidates and sends them to the Central Trigger Processor (CTP). In the first runs of ATLAS, the L1 Barrel trigger candidate data were transferred to the MuCTPI via copper cables. In order to cope with the trigger upgrade, serial optical links are necessary. The optical links will provide a much higher bandwidth (up to 6.4 Gb/s) which will be used to transfer additional information from the sector logic modules, for example data for more than two muon candidates. They will also provide a lower transmission latency. I developed the interface board between the new MUCTPI and the Resistive Plate Chambers (RPC) muon trigger, using the Xilinx Artix-7 FPGA GTP transceivers. I took care of the study of feasibility of the new serial optical transmitter and the logic for the new data format. Also in this case, the fixed latency has been a requirement to be fulfilled

    Trigger and Timing Distributions using the TTC-PON and GBT Bridge Connection in ALICE for the LHC Run 3 Upgrade

    Full text link
    The ALICE experiment at CERN is preparing for a major upgrade for the third phase of data taking run (Run 3), when the high luminosity phase of the Large Hadron Collider (LHC) starts. The increase in the beam luminosity will result in high interaction rate causing the data acquisition rate to exceed 3 TB/sec. In order to acquire data for all the events and to handle the increased data rate, a transition in the readout electronics architecture from the triggered to the trigger-less acquisition mode is required. In this new architecture, a dedicated electronics block called the Common Readout Unit (CRU) is defined to act as a nodal communication point for detector data aggregation and as a distribution point for timing, trigger and control (TTC) information. TTC information in the upgraded triggerless readout architecture uses two asynchronous high-speed serial links connections: the TTC-PON and the GBT. We have carried out a study to evaluate the quality of the embedded timing signals forwarded by the CRU to the connected electronics using the TTC-PON and GBT bridge connection. We have used four performance metrics to characterize the communication bridge: (a)the latency added by the firmware logic, (b)the jitter cleaning effect of the PLL on the timing signal, (c)BER analysis for quantitative measurement of signal quality, and (d)the effect of optical transceivers parameter settings on the signal strength. Reliability study of the bridge connection in maintaining the phase consistency of timing signals is conducted by performing multiple iterations of power on/off cycle, firmware upgrade and reset assertion/de-assertion cycle (PFR cycle). The test results are presented and discussed concerning the performance of the TTC-PON and GBT bridge communication chain using the CRU prototype and its compliance with the ALICE timing requirements

    Digital Design of New Chaotic Ciphers for Ethernet Traffic

    Get PDF
    Durante los últimos años, ha habido un gran desarrollo en el campo de la criptografía, y muchos algoritmos de encriptado así como otras funciones criptográficas han sido propuestos.Sin embargo, a pesar de este desarrollo, hoy en día todavía existe un gran interés en crear nuevas primitivas criptográficas o mejorar las ya existentes. Algunas de las razones son las siguientes:• Primero, debido el desarrollo de las tecnologías de la comunicación, la cantidad de información que se transmite está constantemente incrementándose. En este contexto, existen numerosas aplicaciones que requieren encriptar una gran cantidad de datos en tiempo real o en un intervalo de tiempo muy reducido. Un ejemplo de ello puede ser el encriptado de videos de alta resolución en tiempo real. Desafortunadamente, la mayoría de los algoritmos de encriptado usados hoy en día no son capaces de encriptar una gran cantidad de datos a alta velocidad mientras mantienen altos estándares de seguridad.• Debido al gran aumento de la potencia de cálculo de los ordenadores, muchos algoritmos que tradicionalmente se consideraban seguros, actualmente pueden ser atacados por métodos de “fuerza bruta” en una cantidad de tiempo razonable. Por ejemplo, cuando el algoritmo de encriptado DES (Data Encryption Standard) fue lanzado por primera vez, el tamaño de la clave era sólo de 56 bits mientras que, hoy en día, el NIST (National Institute of Standards and Technology) recomienda que los algoritmos de encriptado simétricos tengan una clave de, al menos, 112 bits. Por otro lado, actualmente se está investigando y logrando avances significativos en el campo de la computación cuántica y se espera que, en el futuro, se desarrollen ordenadores cuánticos a gran escala. De ser así, se ha demostrado que algunos algoritmos que se usan actualmente como el RSA (Rivest Shamir Adleman) podrían ser atacados con éxito.• Junto al desarrollo en el campo de la criptografía, también ha habido un gran desarrollo en el campo del criptoanálisis. Por tanto, se están encontrando nuevas vulnerabilidades y proponiendo nuevos ataques constantemente. Por consiguiente, es necesario buscar nuevos algoritmos que sean robustos frente a todos los ataques conocidos para sustituir a los algoritmos en los que se han encontrado vulnerabilidades. En este aspecto, cabe destacar que algunos algoritmos como el RSA y ElGamal están basados en la suposición de que algunos problemas como la factorización del producto de dos números primos o el cálculo de logaritmos discretos son difíciles de resolver. Sin embargo, no se ha descartado que, en el futuro, se puedan desarrollar algoritmos que resuelvan estos problemas de manera rápida (en tiempo polinomial).• Idealmente, las claves usadas para encriptar los datos deberían ser generadas de manera aleatoria para ser completamente impredecibles. Dado que las secuencias generadas por generadores pseudoaleatorios, PRNGs (Pseudo Random Number Generators) son predecibles, son potencialmente vulnerables al criptoanálisis. Por tanto, las claves suelen ser generadas usando generadores de números aleatorios verdaderos, TRNGs (True Random Number Generators). Desafortunadamente, los TRNGs normalmente generan los bits a menor velocidad que los PRNGs y, además, las secuencias generadas suelen tener peores propiedades estadísticas, lo que hace necesario que pasen por una etapa de post-procesado. El usar un TRNG de baja calidad para generar claves, puede comprometer la seguridad de todo el sistema de encriptado, como ya ha ocurrido en algunas ocasiones. Por tanto, el diseño de nuevos TRNGs con buenas propiedades estadísticas es un tema de gran interés.En resumen, es claro que existen numerosas líneas de investigación en el ámbito de la criptografía de gran importancia. Dado que el campo de la criptografía es muy amplio, esta tesis se ha centra en tres líneas de investigación: el diseño de nuevos TRNGs, el diseño de nuevos cifradores de flujo caóticos rápidos y seguros y, finalmente, la implementación de nuevos criptosistemas para comunicaciones ópticas Gigabit Ethernet a velocidades de 1 Gbps y 10 Gbps. Dichos criptosistemas han estado basados en los algoritmos caóticos propuestos, pero se han adaptado para poder realizar el encriptado en la capa física, manteniendo el formato de la codificación. De esta forma, se ha logrado que estos sistemas sean capaces no sólo de encriptar los datos sino que, además, un atacante no pueda saber si se está produciendo una comunicación o no. Los principales aspectos cubiertos en esta tesis son los siguientes:• Estudio del estado del arte, incluyendo los algoritmos de encriptado que se usan actualmente. En esta parte se analizan los principales problemas que presentan los algoritmos de encriptado standard actuales y qué soluciones han sido propuestas. Este estudio es necesario para poder diseñar nuevos algoritmos que resuelvan estos problemas.• Propuesta de nuevos TRNGs adecuados para la generación de claves. Se exploran dos diferentes posibilidades: el uso del ruido generado por un acelerómetro MEMS (Microelectromechanical Systems) y el ruido generado por DNOs (Digital Nonlinear Oscillators). Ambos casos se analizan en detalle realizando varios análisis estadísticos a secuencias obtenidas a distintas frecuencias de muestreo. También se propone y se implementa un algoritmo de post-procesado simple para mejorar la aleatoriedad de las secuencias generadas. Finalmente, se discute la posibilidad de usar estos TRNGs como generadores de claves. • Se proponen nuevos algoritmos de encriptado que son rápidos, seguros y que pueden implementarse usando una cantidad reducida de recursos. De entre todas las posibilidades, esta tesis se centra en los sistemas caóticos ya que, gracias a sus propiedades intrínsecas como la ergodicidad o su comportamiento similar al comportamiento aleatorio, pueden ser una buena alternativa a los sistemas de encriptado clásicos. Para superar los problemas que surgen cuando estos sistemas son digitalizados, se proponen y estudian diversas estrategias: usar un sistema de multi-encriptado, cambiar los parámetros de control de los sistemas caóticos y perturbar las órbitas caóticas.• Se implementan los algoritmos propuestos. Para ello, se usa una FPGA Virtex 7. Las distintas implementaciones son analizadas y comparadas, teniendo en cuenta diversos aspectos tales como el consumo de potencia, uso de área, velocidad de encriptado y nivel de seguridad obtenido. Uno de estos diseños, se elige para ser implementado en un ASIC (Application Specific Integrate Circuit) usando una tecnología de 0,18 um. En cualquier caso, las soluciones propuestas pueden ser también implementadas en otras plataformas y otras tecnologías.• Finalmente, los algoritmos propuestos se adaptan y aplican a comunicaciones ópticas Gigabit Ethernet. En particular, se implementan criptosistemas que realizan el encriptado al nivel de la capa física para velocidades de 1 Gbps y 10 Gbps. Para realizar el encriptado en la capa física, los algoritmos propuestos en las secciones anteriores se adaptan para que preserven el formato de la codificación, 8b/10b en el caso de 1 Gb Ethernet y 64b/10b en el caso de 10 Gb Ethernet. En ambos casos, los criptosistemas se implementan en una FPGA Virtex 7 y se diseña un set experimental, que incluye dos módulos SFP (Small Form-factor Pluggable) capaces de transmitir a una velocidad de hasta 10.3125 Gbps sobre una fibra multimodo de 850 nm. Con este set experimental, se comprueba que los sistemas de encriptado funcionan correctamente y de manera síncrona. Además, se comprueba que el encriptado es bueno (pasa todos los test de seguridad) y que el patrón del tráfico de datos está oculto.<br /

    Implementation of Ultra-Low Latency and High-Speed Communication Channels for an FPGA-Based HPC Cluster

    Get PDF
    RÉSUMÉ Les clusters basés sur les FPGA bénéficient de leur flexibilité et de leurs performances en termes de puissance de calcul et de faible consommation. Et puisque la consommation de puissance devient un élément de plus en plus importants sur le marché des superordinateurs, le domaine d’exploration multi-FPGA devient chaque année plus populaire. Les performances des ordinateurs n’ont jamais cessé d’augmenter mais la latence des réseaux d’interconnexion n’a pas suivi leur taux d’amélioration. Dans le but d’augmenter le niveau d’abstraction et les fonctionnalités des interconnexions, la complexité des piles de communication atteinte à nos jours engendre des coûts et affecte la latence des communications, ce qui rend ces piles de communication très souvent inefficaces, voire inutiles. Les protocoles de communication commerciaux existants et les contrôleurs d’interfaces réseau FPGA-FPGA n’ont la performance pour supporter ni les applications à temps critique ni un partitionnement étroitement couplé des systèmes sur puce. Au lieu de cela, les approches de communication personnalisées sont souvent préférées. Dans ce travail, nous proposons une implémentation de canaux de communication à haut débit et à faible latence pour une grappe de FPGA. Le système est constitué de deux BEE3, chacun contenant 4 FPGA de la famille Virtex-5 interconnectés par une topologie en anneau. Notre approche exploite la technologie à transducteur à plusieurs gigabits par seconde pour l’obtention d’une bande passante fiable de 8Gbps. Le module de propriété intellectuelle (IP) de communication proposé permet le transfert de données entre des milliers de coprocesseurs sur le réseau, grâce à l’implémentation d’un réseau direct avec capacité de routage de paquets. Les résultats expérimentaux ont montré une latence de seulement 34 cycles d’horloge entre deux noeuds voisins, ce qui est un des plus bas parmi ceux rapportés dans la littérature. En outre, nous proposons une architecture adaptée au calcul à haute performance qui comporte un traitement extensible, parallèle et distribué. Pour une plateforme à 8 FPGA, l’architecture fournit 35.6Go/s de bande passante effective pour la mémoire externe, une bande passante globale de réseau de 128Gbps et une puissance de calcul de 8.9GFLOPS. Un solveur matrice-vecteur de grande taille est partitionné et mis en oeuvre à travers le cluster. Nous avons obtenu une performance et une efficacité de calcul concurrentielles grâce à la faible empreinte du protocole de communication entre les éléments de traitement distribués. Ce travail contribue à soutenir de nouvelles recherches dans le domaine du calcul parallèle intensif et permet le partitionnement de système sur puce à grande taille sur des clusters à base de FPGA.----------ABSTRACT An FPGA-based cluster profits from the flexibility and the performance potential FPGA technology provides. Since price and power consumption are becoming increasingly important elements in the High-Performance Computing market, the multi-FPGA exploration field is getting more popular each year. Network latency has failed to keep up with other improvements in computer performance. Complex communication stacks have sacrificed latency and increased overhead to achieve other goals, being in most of the time inefficient and unnecessary. The existing commercial offthe- shelf communication protocols and Network Interfaces Controllers for FPGA-to-FPGA interconnection lack of performance to support time-critical applications and tightly coupled System-on-Chip partitioning. Instead, custom communication approaches are preferred. In this work, ultra-low latency and high-speed communication channels for an FPGA-based cluster are presented. Two BEE3s grouping 8 FPGAs Virtex-5 interconnected in a ring topology, compose the targeting platform. Our approach exploits Multi-Gigabit Transceiver technology to achieve reliable 8Gbps channel bandwidth. The proposed communication IP supports data transfer from coprocessors over the network, by means of a direct network implementation with hop-by-hop packet routing capability. Experimental results showed a latency of only 34 clock cycles between two neighboring nodes, being one of the lowest in the literature. In addition, it is proposed an architecture suitable for High-Performance Computing which includes performing scalable, parallel, and distributed processing. For an 8 FPGAs platform, the architecture provides 35.6GB/s off-chip memory throughput, 128Gbps network aggregate bandwidth, and 8.9GFLOPS computing power. A large and dense matrix-vector solver is partitioned and implemented across the cluster. We achieved competitive performance and computational efficiency as a result of the low communication overhead among the distributed processing elements. This work contributes to support new researches on the intense parallel computing fields, and enables large System-on-Chip partitioning and scaling on FPGA-based clusters

    A time-based approach for multi-GHz embedded mixed-signal characterization and measurement /

    Get PDF
    The increasingly more sophisticated systems that are nowadays implemented on a single chip are placing stringent requirements on the test industry. New test strategies, equipment, and methodologies need to be developed to sustain the constant increase in demand for consumer and communication electronics. Techniques for built-in-self-test (BIST) and design-for-test (DFT) strategies have been proven to offer more feasible and economical testing solutions.Previous works have been conducted to perform on-chip testing, characterization, and measurement of signals and components. The current thesis advances those techniques on many levels. In terms of performance, an increase of more than an order of magnitude in speed is achieved. 70-GHz (effective sampling) on-chip oscilloscope is reported, compared to 4-GHz and 10-GHz ones in previous state-of-the-art implementations. Power dissipation is another area where the proposed work offer a superior solution compared to previous alternatives. All the proposed circuits do not exceed a few milliWatts of power dissipation, while performing multi-GHz high-speed signal capture at a medium resolution. Finally, and possibly most importantly, all the proposed circuits for test rely on a different form of signal processing; the time-based approach. It is believed that this approach paves the path to a lot of new techniques and circuit design skills that can be investigated more deeply. As an integral part of the time-based processing approach for GHz signal capture, this thesis verifies the advantages of using time amplification. The use of such amplification in the time domain is materialized with experimental results from three specific integrated circuits achieving different tasks in GHz high-speed in-situ signal measurement and characterization. Advantages of using such time-based approach techniques, when combined with the use of a front-end time amplifier, include noise immunity, the use of synthesizable digital cells, and circuit building blocks that track the technology scaling in terms of area and speed

    Energy-efficient wireline transceivers

    Get PDF
    Power-efficient wireline transceivers are highly demanded by many applications in high performance computation and communication systems. Apart from transferring a wide range of data rates to satisfy the interconnect bandwidth requirement, the transceivers have very tight power budget and are expected to be fully integrated. This thesis explores enabling techniques to implement such transceivers in both circuit and system levels. Specifically, three prototypes will be presented: (1) a 5Gb/s reference-less clock and data recovery circuit (CDR) using phase-rotating phase-locked loop (PRPLL) to conduct phase control so as to break several fundamental trade-offs in conventional receivers; (2) a 4-10.5Gb/s continuous-rate CDR with novel frequency acquisition scheme based on bang-bang phase detector (BBPD) and a ring oscillator-based fractional-N PLL as the low noise wide range DCO in the CDR loop; (3) a source-synchronous energy-proportional link with dynamic voltage and frequency scaling (DVFS) and rapid on/off (ROO) techniques to cut the link power wastage at system level. The receiver/transceiver architectures are highly digital and address the requirements of new receiver architecture development, wide operating range, and low power/area consumption while being fully integrated. Experimental results obtained from the prototypes attest the effectiveness of the proposed techniques

    A SPECIAL PURPOSEPROCESSORFOR IC TESTING AND SPEED CHARACTERIZATION

    Get PDF
    corecore