374 research outputs found
Reconfigurable FPGA Architecture for Computer Vision Applications in Smart Camera Networks
Smart Camera Networks (SCNs) is nowadays an emerging research field which represents the
natural evolution of centralized computer vision applications towards full distributed and
pervasive systems. In this vision, one of the biggest effort is in the definition of a flexible and
reconfigurable SCN node architecture able to remotely update the application parameter and the
performed computer vision application at runtime. In this respect, we present a novel SCN node
architecture based on a device in which a microcontroller manage all the network functionality as
well as the remote configuration, while an FPGA implements all the necessary module of a full
computer vision pipeline. In this work the envisioned architecture is first detailed in general
terms, then a real implementation is presented to show the feasibility and the benefits of the
proposed solution. Finally, performance evaluation results underline the potential of an hardware
software codesign approach in reaching flexibility and reduced processing time
About the development of visual search algorithms and their hardware implementations
2015 - 2016The main goal of my work is to exploit the benefits of a hardware implementation
of a 3D visual search pipeline. The term visual search refers
to the task of searching objects in the environment starting from the real
world representation. Object recognition today is mainly based on scene
descriptors, an unique description for special spots in the data structure.
This task has been implemented traditionally for years using just plain
images: an image descriptor is a feature vector used to describe a position
in the images. Matching descriptors present in different viewing of the
same scene should allows the same spot to be found from different angles,
therefore a good descriptor should be robust with respect to changes in:
scene luminosity, camera affine transformations (rotation, scale and translation),
camera noise and object affine transformations. Clearly, by using
2D images it is not possible to be robust with respect to the change in the
projective space, e.g. if the object is rotated with respect to the up camera
axes its 2D projection will dramatically change. For this reason, alongside
2D descriptors, many techniques have been proposed to solve the projective
transformation problem using 3D descriptors that allow to map the shape of
the objects and consequently the surface real appearance. This category of
descriptors relies on 3D Point Cloud and Disparity Map to build a reliable
feature vector which is invariant to the projective transformation. More
sophisticated techniques are needed to obtain the 3D representation of the
scene and, if necessary, the texture of the 3D model and obviously these
techniques are also more computationally intensive than the simple image
capture. The field of 3D model acquisition is very broad, it is possible to
distinguish between two main categories: active and passive methods. In
the active methods category we can find special devices able to obtain 3D
information projecting special light and. Generally an infrared projector
is coupled with a camera: while the infrared light projects a well known
and fixed pattern, the camera will receive the information of the patterns
reflection on a certain surface and the distortion in the pattern will give
the precise depth of every point in the scene. These kind of sensors are of
i
i
“output” — 2017/6/22 — 18:23 — page 3 — #3
i
i
i
i
i
i
3
course expensive and not very efficient from the power consumption point of
view, since a lot of power is wasted projecting light and the use of lasers also
imposes eye safety rules on frame rate and transmissed power. Another way
to obtain 3D models is to use passive stereo vision techniques, where two
(or more) cameras are required which only acquire the scene appearance.
Using the two (or more) images as input for a stereo matching algorithm it
is possible to reconstruct the 3D world. Since more computational resources
will be needed for this task, hardware acceleration can give an impressive
performance boost over pure software approach.
In this work I will explore the principal steps of a visual search pipeline
composed by a 3D vision and a 3D description system. Both systems
will take advantage of a parallelized architecture prototyped in RTL and
implemented on an FPGA platform. This is a huge research field and in
this work I will try to explain the reason for all the choices I made for my
implementation, e.g. chosen algorithms, applied heuristics to accelerate
the performance and selected device. In the first chapter we explain the
Visual Search issues, showing the main components required by a Visual
Search pipeline. Then I show the implemented architecture for a stereo
vision system based on a Bio-informatics inspired approach, where the final
system can process up to 30fps at 1024 × 768 pixels. After that a clever
method for boosting the performance of 3D descriptor is presented and as
last chapter the final architecture for the SHOT descriptor on FPGA will
be presented. [edited by author]L’obiettivo principale di questo lavoro e’ quello di esplorare i benefici di una
implementazione hardware per una pipeline di visual search 3D. Il termine
visual search si riferisce al problema di ricerca di oggetti nell’ambiente.
L’object recognition ai giorni nostri e’ principalmente basato sull’uso di
descrittori della scena, una descrizione univoca per i punti salienti. Questo
compito e’ stato implementato per anni utilizzando immagini: il descrittore
di un punto dell’immagine e’ un semplice vettore di caratteristiche. Accoppiando
i descrittori presenti in differenti viste della stessa scena permette
di trovare punti nello spazio visibili da entrambe le viste. Chiaramente,
utilizzando immagini 2D non e’ possibile avere descrittori che sono robusti a
cambiamenti della prospettiva, per questo motivo, molte tecniche sono state
proposte per risolvere questo problema utilizzando descrittori 3D. Questa
categoria di descrittori si avvale di 3D point cloud e mappe di disparita’.
Ovviamente tecniche piu’ sofisticate sono necessarie per ottenere la rappresentazione
3D della scena. Il campo dell’acquisizione 3D e’ molto vasto ed
e’ possibile distinguere tra due categorie di sensori: sensori attivi e passivi.
Tra i sensori attivi possiamo annoverare dispositivi in grado di proiettare un
pattern di luce infrarossa sulla scena, questo pattern noto presenta delle variazioni
dovute agli oggetti presenti nella scena. Una camera infrarossi riceve
l’immagine distorta del pattern e deduce la geometria della scena. Questo
tipo di dispositivi non sono molto efficienti dal punto di vista energetico
dato che un sacco di corrente viene consumata per proiettare il pattern. Un
altro modo per ottenere un modello 3D e’ quello di usare sensori passivi,
una coppia di telecamere puo’ essere utilizzata per ottenere informazioni
utilizzando metodi di triangolazione. Questi metodi pero’ richiedono un
sacco di potenza computazionale nel caso di applicazioni real time, per
questo motivo e’ necessario utilizzare dispositivi ad-hoc quali architetture
hardware dedicate implementate mediante l’uso di FPGA e ASIC.
In questo lavoro ho esplorato gli step principali di una pipeline per la visual
search composta da un sistema di visione 3D e uno per la descrizione di
punti. Entrambi i sistemi si avvalgono di achitetture hardware dedicate
prototipate in RTL e implementate su FPGA. Questo e’ un grosso campo
di lavoro e provo ad esplorare i benefici di una implementazione harwadere
per l’accelerazione degli algoritmi stessi e il risparmi di energia elettrica. [a cura dell'autore]XV n.s
Real-time near replica detection over massive streams of shared photos
Aquest treball es basa en la detecció en temps real de repliques d'imatges en entorns distribuïts a partir de la indexació de vectors de característiques locals
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry
Molecular property calculations are the bedrock of chemical physics.
High-fidelity \textit{ab initio} modeling techniques for computing the
molecular properties can be prohibitively expensive, and motivate the
development of machine-learning models that make the same predictions more
efficiently. Training graph neural networks over large molecular databases
introduces unique computational challenges such as the need to process millions
of small graphs with variable size and support communication patterns that are
distinct from learning over large graphs such as social networks. This paper
demonstrates a novel hardware-software co-design approach to scale up the
training of graph neural networks for molecular property prediction. We
introduce an algorithm to coalesce the batches of molecular graphs into fixed
size packs to eliminate redundant computation and memory associated with
alternative padding techniques and improve throughput via minimizing
communication. We demonstrate the effectiveness of our co-design approach by
providing an implementation of a well-established molecular property prediction
model on the Graphcore Intelligence Processing Units (IPU). We evaluate the
training performance on multiple molecular graph databases with varying degrees
of graph counts, sizes and sparsity. We demonstrate that such a co-design
approach can reduce the training time of such molecular property prediction
models from days to less than two hours, opening new possibilities for
AI-driven scientific discovery
Digital FPGA Circuits Design for Real-Time Video Processing with Reference to Two Application Scenarios
In the present days of digital revolution, image and/or video processing has become a ubiquitous task: from mobile devices to special environments, the need for a real-time approach is everyday more and more evident. Whatever the reason, either for user experience in recreational or internet-based applications or for safety related timeliness in hard-real-time scenarios, the exploration of technologies and techniques which allow for this requirement to be satisfied is a crucial point. General purpose CPU or GPU software implementations of these applications are quite simple and widespread, but commonly do not allow high performance because of the high layering that separates high level languages and libraries, which enforce complicated procedures and algorithms, from the base architecture of the CPUs that offers only limited and basic (although rapidly executed) arithmetic operations. The most practised approach nowadays is based on the use of Very-Large-Scale Integrated (VLSI) digital electronic circuits.
Field Programmable Gate Arrays (FPGAs) are integrated digital circuits designed to be configured after manufacturing, "on the field". They typically provide lower performance levels when compared to Application Specific Integrated Circuits (ASICs), but at a lower cost, especially when dealing with limited production volumes. Of course, on-the-field programmability itself (and re-programmability, in the vast majority of cases) is also a characteristic feature that makes FPGA more suitable for applications with changing specifications where an update of capabilities may be a desirable benefit. Moreover, the time needed to fulfill the design cycle for FPGA-based circuits (including of course testing and debug speed) is much reduced when compared to the design flow and time-to-market of ASICs.
In this thesis work, we will see (Chapter 1) some common problems and strategies involved with the use of FPGAs and FPGA-based systems for Real Time Image Processing and Real Time Video Processing (in the following alsoindicated interchangeably with the acronym RTVP); we will then focus, in particular, on two applications.
Firstly, Chapter 2 will cover the implementation of a novel algorithm for Visual Search, known as CDVS, which has been recently standardised as part of the MPEG-7 standard. Visual search is an emerging field in mobile applications which is rapidly becoming ubiquitous. However, typically, algorithms for this kind of applications are connected with a high leverage on computational power and complex elaborations: as a consequence, implementation efficiency is a crucial point, and this generally results in the need for custom designed hardware.
Chapter 3 will cover the implementation of an algorithm for the compression of hyperspectral images which is bit-true compatible with the CCSDS-123.0 standard algorithm. Hyperspectral images are three dimensional matrices in which each 2D plane represents the image, as captured by the sensor, in a given spectral band: their size may range from several millions of pixels up to billions of pixels. Typical scenarios of use of hyperspectral images include airborne and satellite-borne remote sensing. As a consequence, major concerns are the limitedness of both processing power and communication links bandwidth: thus, a proper compression algorithm, as well as the efficiency of its implementation, is crucial.
In both cases we will first of all examine the scope of the work with reference to current state-of-the-art. We will then see the proposed implementations in their main characteristics and, to conclude, we will consider the primary experimental results
Dynamically reconfigurable architecture for embedded computer vision systems
The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses
Software for Embedded Module for Image Processing
katedra kybernetik
- …