11 research outputs found

    Programmable active memories in real-time tasks: implementing data-driven triggers for LHC experiments

    Full text link
    The future Large Hadron Collider (LHC), to be built at CERN, presents among other technological challenges a formidable problem of real-time data analysis. At a primary event rate of 40 MHz, a multi-stage trigger system has to analyze data to decide which is the fraction of events that should be preserved on permanent storage for further analysis. We report on implementations of local algorithms for feature extraction as part of triggering, using the detectors of the proposed ATLAS experiment as a model. The algorithms were implemented for a decision frequency of 100 kHz, on different data-driven programmable devices based on structures of field- programmable gate arrays and memories. The implementations were demonstrated at full speed with emulated input, and were also integrated into a prototype detector running in a test beam at CERN, in June 1994

    Enable++ : a second generation FPGA processor

    Get PDF
    In the computing community field programmable processors are going to fill the niche for special purpose computing devices. A typical example is ultra-fast pattern recognition in experimental particle physics - a task for which we constructed two years ago Enable- 1, an FPGA processor rather specialized for pattern recognition algorithms in μs domain, but also provided with modest features for coping with more general applications. This paper presents the follow-up modell Enable++, a 2nd generation FPGA processor that offers several substantial enhancements over the previous system for a wider range of applications: Enable++ is structured into three different state-of-the-art modules for providing computing power, flexible and high-speed I/O communication and powerful intermodule communication with a raw bandwidth of 3.2 GByte/s by an active backplane. The technical realization of all three modules is guided by the maximum usage of field programmable logic. The actual demand of computing-and I/O-power can be satisified by the number of modules plugged into the crate. Enhanced features of Enable++ comprise the configurable processor topology provided by programmable crossbar switches. In combination with the 4 x 4 FPGA array and 12 MByte distributed RAM the Enable++ computing core offers a strongly increased and scalable computing power. For building new applications the system offers a comfortable programming and debugging environment consisting of a compiler for the C-like hardware description language spC, a simulator and a source level debugger for hardware design. The goal in planning the hardware design environment for Enable++ from scratch is to transfer established methodologies in software design to the design of digital logic. Concerning pattern recognition tasks, we estimate that Enable++ surpasses modern RISC processors by a factor of 100 to 1000

    The hardware track finder processor in CMS at CERN

    Get PDF
    The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed in order to select the most appropriate measurement method and implementation technology. Already existing systems were evaluated and their specifications were compared with thos e of the track finder processor's. The classic method in high energy physics experiments is to search for predefined tracks or bit patterns in the measurement data and to determine their properties. T he predefined patterns are compared to the found patterns. The high number of data channels of the track finder processor and the complex requirements to the spatial detector resolution do not permit to employ a pattern comparison method. A so called track following algorithm was designed, which is able to assemble complete tracks through the whole detector starting from single track segments. Ins tead of storing a high number of track patterns an algorithm for track finding and momentum measurement is employed directly. This enables to realize a hardware implementation within the requirements given by the experiment. The algorithm was translated to the level of digital electronics. Comprehensive simulations, employing the hardware simulation language VHDL, were conducted in order to optimi ze the algorithm and its hardware implementation. An FPGA (field programmable gate array)-prototype and a test system was designed. A feasibility study to implement the track finder processor employin g ASICs was conducted. The study proves that the track finder processor can be implemented using today's technology

    Der ATLAS LVL2-Trigger mit FPGA-Prozessoren : Entwicklung, Aufbau und Funktionsnachweis des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS

    Get PDF
    Diese Arbeit beschreibt die Konzeption und Realisierung des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS als Triggerprozessor für das geplante ATLAS-Experiment am CERN. Auf der Basis von CompactPCI wird eine enge Verknüpfung zwischen einem Multi-FPGA-System und einer Standard-CPU umgesetzt. Das System ist in der Rechenleistung skalierbar und flexibel nutzbar. Dies wird durch die Aufteilung in spezifische FPGA-Boards für die Algorithmenausführung und I/O-Funktionalität und durch einen integrierten Privat-Bus erreicht. Die Untersuchungen mit dem ATLANTIS-System beziehen sich auf zwei Kernstellen der 2. Triggerstufe (LVL2). Zum einen soll die Ausführung zeitkritischer B-Physik-Triggeralgorithmen beschleunigt werden. Der im Rahmen dieser Arbeit als Funktionsnachweis durchgeführte Benchmark des Full-Scan-TRT-Algorithmus hat gezeigt, daß die Ausführung gegenüber einer Standard-CPU um einen Faktor 5.6 beschleunigt werden kann. Als zweite ATLAS-Anwendung werden mit dem ATLANTIS-System Studien zu den Readout-Systemen durchgeführt. Für Untersuchungen im LVL2-Prototypensystem ist eine dauerhafte Installation des ATLANTIS-Systems am CERN vorgesehen. Der universelle Charakter von ATLANTIS zeigt sich in weiteren Anwendungen, die für das System entwickelt werden und deren Umsetzung im Rahmen dieser Arbeit unterstützt wurde: Das sind Triggeraufgaben bei Experimenten an der GSI/Darmstadt, die beschleunigte Ausführung von 2D/3D-Bildverarbeitungsanwendungen und die Simulation von N-Körper-Systemen in der Astrophysik. Die Anwendungsentwicklung kann mit der standardisierten Hardwarebeschreibungssprache VHDL durchgeführt werden. Alternativ dazu kann die in Mannheim entwickelte Sprache CHDL benutzt werden. Die Entwicklungs-Tools werden durch das ATLANTIS-Betriebssystem ergänzt

    Applications of reprogrammability in algorithm acceleration

    Get PDF
    This doctoral thesis consists of an introductory part and eight appended publications, which deal with hardware-based reprogrammability in algorithm acceleration with a specific emphasis on the possibilities offered by modern large-scale Field Programmable Gate Arrays (FPGAs) in computationally demanding applications. The historical evolution of both the theoretical and technological paths culminating in the introduction of reprogrammable logic devices is first outlined. This is followed by defining the commonly used terms in the thesis. The reprogrammable logic market is surveyed, and the architectural structures and the technological reasonings behind them are described in detail. As reprogrammable logic lies between Application Specific Integrated Circuits (ASICs) and general-purpose microprocessors in the implementation spectrum of electronics systems, special attention has been paid to differentiate these three implementation approaches. This has been done to emphasize, that reprogrammable logic offers much more than just a low-volume replacement for ASICs. Design systems for reprogrammable logic are investigated, as the learning curve associated with them is the main hurdle for software-oriented designers for using reprogrammable logic devices. The theoretically important topic of partial reprogrammability is described in detail, but it is concluded, that the practical problems in designing viable development platforms for partially reprogrammable systems will hinder its wide-spread adoption. The main technical, design-oriented, and economic applicability factors of reprogrammable logic are laid out. The main advantages of reprogrammable logic are their suitability for fine-grained bit-level parallelizable computing with a short time-to-market and low upfront costs. It is also concluded, that the main opportunities for reprogrammable logic lie in the potential of high-level design systems, and the ever-growing ASIC design gap. On the other hand, most power-conscious mass-market portable products do not seem to offer major new market potential for reprogrammable logic. The appended publications are examined and compared to contemporaneous research at other research institutions. The conclusion is that for relatively wide classes of well-defined computation problems, reprogrammable logic offers a more efficient solution than a software-centered approach, with a much shorter production cycle than is the case with ASICs.reviewe

    Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System Performance.

    Get PDF
    The use of a network of shared, heterogeneous workstations each harboring a Reconfigurable Computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system’s performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of Reconfigurable Computing systems. This dissertation develops and validates an analytic performance modeling methodology for a class of fork-join algorithms executing on a High Performance Reconfigurable Computing (HPRC) platform. The model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. Three fork-join class of applications, a Boolean Satisfiability Solver, a Matrix-Vector Multiplication algorithm, and an Advanced Encryption Standard algorithm are used to validate the model with homogeneous and simulated heterogeneous workstations. A synthetic load is used to validate the model under various loading conditions including simulating heterogeneity by making some workstations appear slower than others by the use of background loading. The performance modeling methodology proves to be accurate in characterizing the effects of reconfigurable devices, application load imbalance, background user load and heterogeneity for applications running on shared, homogeneous and heterogeneous HPRC resources. The model error in all cases was found to be less than five percent for application runtimes greater than thirty seconds and less than fifteen percent for runtimes less than thirty seconds. The performance modeling methodology enables us to characterize applications running on shared HPRC resources. Cost functions are used to impose system usage policies and the results of vii the modeling methodology are utilized to find the optimal (or near-optimal) set of workstations to use for a given application. The usage policies investigated include determining the computational costs for the workstations and balancing the priority of the background user load with the parallel application. The applications studied fall within the Master-Worker paradigm and are well suited for a grid computing approach. A method for using NetSolve, a grid middleware, with the model and cost functions is introduced whereby users can produce optimal workstation sets and schedules for Master-Worker applications running on shared HPRC resources

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    High-Energy Physics on DECPeRLe-1 Programmable Active Memory

    No full text
    The future Large Hadron Collider (LHC) to be built at CERN 1 , by the turn of the millenium, provides an ample source of challenging real-time computational problems. We report here some results from a collaboration between CERN EAST 2 (RD-11) group and DECPRL PAM 3 team. We present the implementations of the three foremost LHC algorithms on DECPeRLe-1 [2]. Our machine is the only one which presently meets the requirements from CERN (100 kHz event rate), except for another dedicated FPGA-based board built for just one of the algorithm [3]. All other implementations based on single and multiprocessor general purpose computing systems fall short either of computing power, or of I/O resources or both. 1 Introduction 1.1 High-Energy Physics The community of High-Energy Physics is about to decide to go forward with the next generation collider to be built at CERN, the LHC. With this new instrument, it will be possible to observe proton-proton collisions of 8000 GeV, an energy not at..
    corecore