4,103 research outputs found
Recommended from our members
Hardware accelerator for ALICE ITS Cluster Finder
An integral part of the upgrade to the Inner Tracking System (ITS) of the ALICE detector is to support increased readout rates of the charged particles resulting due to increased interaction rate of 50kHz in Pb-Pb collisions at the Large Hadron Collider (LHC). A major task of the ITS readout system is to compress the data and store it in the mass storage system for later analysis. The first step of data compression involves cluster finding on the pixel data received from ALPIDE sensors followed by Huffman compression. In this Thesis, we evaluate the resource requirements for implementing cluster finding on the Arria 10 FPGAs which are an integral part of the ITS readout system, in an attempt to reduce the computing nodes needed on the First Level Processors (FLPs) and also to speed up the processing. We present a hardware implementation of a single pass Connected Component Labeling algorithm. A special linked list based merger table that ensures a constant worst case latency for chained label mergers independent of their length is proposed. For retrieving the shapeIDs, pixels are segregated into clusters on-the-fly without the need to store labeled pixels in memory. Verilog code implementing this design has been written, a testbench for functional verification has been developed, and the design has been synthesized.Electrical and Computer Engineerin
Integrated Development and Parallelization of Automated Dicentric Chromosome Identification Software to Expedite Biodosimetry Analysis
Manual cytogenetic biodosimetry lacks the ability to handle mass casualty events. We present an automated dicentric chromosome identification (ADCI) software utilizing parallel computing technology. A parallelization strategy combining data and task parallelism, as well as optimization of I/O operations, has been designed, implemented, and incorporated in ADCI. Experiments on an eight-core desktop show that our algorithm can expedite the process of ADCI by at least four folds. Experiments on Symmetric Computing, SHARCNET, Blue Gene/Q multi-processor computers demonstrate the capability of parallelized ADCI to process thousands of samples for cytogenetic biodosimetry in a few hours. This increase in speed underscores the effectiveness of parallelization in accelerating ADCI. Our software will be an important tool to handle the magnitude of mass casualty ionizing radiation events by expediting accurate detection of dicentric chromosomes
A State-of-the-Art Review with Code about Connected Components Labeling on GPUs
This article is about Connected Components Labeling (CCL) algorithms developed for GPU accelerators. The task itself is employed in many modern image-processing pipelines and represents a fundamental step in different scenarios, whenever object recognition is required. For this reason, a strong effort in the development of many different proposals devoted to improving algorithm performance using different kinds of hardware accelerators has been made. This paper focuses on GPU-based algorithmic solutions published in the last two decades, highlighting their distinctive traits and the improvements they leverage. The state-of-the-art review proposed is equipped with the source code, which allows to straightforwardly reproduce all the algorithms in different experimental settings. A comprehensive evaluation on multiple environments is also provided, including different operating systems, compilers, and GPUs.
Our assessments are performed by means of several tests, including real-case images and synthetically generated ones, highlighting the strengths and weaknesses of each proposal. Overall, the experimental results revealed that block-based oriented algorithms outperform all the other algorithmic solutions on both 2D images and 3D volumes, regardless of the selected environment
Optimizing GPU-Based Connected Components Labeling Algorithms
Connected Components Labeling (CCL) is a fundamental image processing technique, widely used in various application areas. Computational throughput of Graphical Processing Units (GPUs) makes them eligible for such a kind of algorithms. In the last decade, many approaches to compute CCL on GPUs have been proposed. Unfortunately, most of them have focused on 4-way connectivity neglecting the importance of 8-way connectivity. This paper aims to extend state-of-the-art GPU-based algorithms from 4 to 8-way connectivity and to improve them with additional optimizations. Experimental results revealed the effectiveness of the proposed strategies
Un algoritmo en tiempo real para etiquetado de componentes conectados en imágenes
Esta comunicación presenta un algoritmo de dos pasadas para el etiquetado en tiempo real de los componentes conexos en una imagen. El algoritmo propuesto es una buena opción frente a otras alternativas de dos y múltiples pasadas ya que ha sido diseñado considerando que su implementación en FPGAs ofrezca un buen compromiso entre recursos ocupados y velocidad de operación. Se describen dos implementaciones hardware de este algoritmo, cuyo desarrollo se ha llevado a cabo siguiendo un flujo de diseño basado en la herramienta System Generator de Xilinx.Comunidad Económica Europea MOBY-DIC FP7-IST- 248858Ministerio de Ciencia e Innovación (España) TEC2008-04920Junta de AndalucÃa P08- TIC-03674Fondos Feder P08- TIC-0367
On the Hardware/Software Design and Implementation of a High Definition Multiview Video Surveillance System
published_or_final_versio
Fast and efficient FPGA implementation of connected operators
International audienceThe Connected Component Tree (CCT)-based operators play a central role in the development of new algorithms related to image processing applications such as pattern recognition, video-surveillance or motion extraction. The CCT construction, being a time consuming task (about 80% of the application time), these applications remain far-off mobile embedded systems. This paper presents its efficient FPGA implementation suited for embedded systems. Three main contributions are discussed: an efficient data structure proposal adapted to representing the CCT in embedded systems, a memory organization suitable for FPGA implementation by using on-chip memory and a customizable hardware accelerator architecture for CCT-based applications
- …