197 research outputs found

    Massively Parallel Computing and the Search for Jets and Black Holes at the LHC

    Full text link
    Massively parallel computing at the LHC could be the next leap necessary to reach an era of new discoveries at the LHC after the Higgs discovery. Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) architecture into its server farm. This cutting edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection. In this article we describe new algorithms that would allow to select in the trigger new topological signatures that include non-prompt jet and black hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM

    First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

    Full text link
    Recent innovations focused around {\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \xphi\ 7120 coprocessor. Preliminary time performance will be presented.Comment: 13 pages, 4 figures, Accepted to JINS

    Improving GPU performance : reducing memory conflicts and latency

    Get PDF

    Improving GPU performance : reducing memory conflicts and latency

    Get PDF

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Biblioteca de procesamiento de imágenes optimizada para Arm Cortex-M7

    Get PDF
    La mayoría de los vehículos en la actualidad están equipados con sistemas que asisten al conductor en tareas difíciles y repetitivas, como reducir la velocidad del vehículo en una zona escolar. Algunos de estos sistemas requieren una computadora a bordo capaz de realizar el procesamiento en tiempo real de las imágenes del camino obtenidas por una cámara. El objetivo de este proyecto es implementar una librería de procesamiento de imagen optimizada para la arquitectura ARM® Cortex®-M7. Esta librería provee rutinas para realizar filtrado espacial, resta, binarización y extracción de la información direccional de una imagen, así como el reconocimiento parametrizado de patrones de una figura predefinida utilizando la Transformada Generalizada de Hough. Estas rutinas están escritas en el lenguaje de programación C, para aprovechar las optimizaciones del compilador GNU ARM C, y obtener el máximo desempeño y el mínimo tamaño de objetos. El desempeño de las rutinas fue comparado con la implementación existente para otro microcontrolador, el Freescale® MPC5561. Para probar la funcionalidad de esta librería en una aplicación de tiempo real, se desarrolló un sistema de reconocimiento de señales de tráfico. Los resultados muestran que en promedio el tiempo de ejecución es 18% más rápido y el tamaño de objetos es 25% menor que en la implementación de referencia, lo que habilita a este sistema para procesar hasta 24 cuadros por segundo. En conclusión, estos resultados demuestran la funcionalidad de la librería de procesamiento de imágenes en sistemas de tiempo real.Most modern vehicles are equipped with systems that assist the driver by automating difficult and repetitive tasks, such as reducing the vehicle speed in a school zone. Some of these systems require an onboard computer capable of performing real-time processing of the road images captured by a camera. The goal of this project is to implement an optimized image processing library for the ARM® Cortex®-M7 architecture. This library includes the routines to perform image spatial filtering, subtraction, binarization, and extraction of the directional information along with the parameterized pattern recognition of a predefined template using the Generalized Hough Transform (GHT). These routines are written in the C programming language, leveraging GNU ARM C compiler optimizations to obtain maximum performance and minimum object size. The performance of the routines was benchmarked with an existing implementation for a different microcontroller, the Freescale® MPC5561. To prove the usability of this library in a real-time application, a Traffic Sign Recognition (TSR) system was implemented. The results show that in average the execution time is 18% faster and the binary object size is 25% smaller than in the reference implementation, enabling the TSR application to process up to 24 fps. In conclusion, these results demonstrate that the image processing library implemented in this project is suitable for real-time applications.ITESO, A. C.Consejo Nacional de Ciencia y TecnologíaContinental Automotiv

    Parallel algorithms for Hough transform

    Get PDF

    Implementation of a real time Hough transform using FPGA technology

    Get PDF
    This thesis is concerned with the modelling, design and implementation of efficient architectures for performing the Hough Transform (HT) on mega-pixel resolution real-time images using Field Programmable Gate Array (FPGA) technology. Although the HT has been around for many years and a number of algorithms have been developed it still remains a significant bottleneck in many image processing applications. Even though, the basic idea of the HT is to locate curves in an image that can be parameterized: e.g. straight lines, polynomials or circles, in a suitable parameter space, the research presented in this thesis will focus only on location of straight lines on binary images. The HT algorithm uses an accumulator array (accumulator bins) to detect the existence of a straight line on an image. As the image needs to be binarized, a novel generic synchronization circuit for windowing operations was designed to perform edge detection. An edge detection method of special interest, the canny method, is used and the design and implementation of it in hardware is achieved in this thesis. As each image pixel can be implemented independently, parallel processing can be performed. However, the main disadvantage of the HT is the large storage and computational requirements. This thesis presents new and state-of-the-art hardware implementations for the minimization of the computational cost, using the Hybrid-Logarithmic Number System (Hybrid-LNS) for calculating the HT for fixed bit-width architectures. It is shown that using the Hybrid-LNS the computational cost is minimized, while the precision of the HT algorithm is maintained. Advances in FPGA technology now make it possible to implement functions as the HT in reconfigurable fabrics. Methods for storing large arrays on FPGA’s are presented, where data from a 1024 x 1024 pixel camera at a rate of up to 25 frames per second are processed

    Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

    Get PDF
    Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
    corecore