Search CORE

4 research outputs found

Memory-efficient belief propagation for high-definition real-time stereo matching systems

Author: Martínez Vázquez Marcos
Perez Llano Jesús Miguel
Sánchez Espeso Pablo Pedro
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/02/2010
Field of study

Tele-presence systems will enable participants to feel like they are physically together. In order to improve this feeling, these systems are starting to include depth estimation capabilities. A typical requirement for these systems includes high definition, good quality results and low latency. Benchmarks demonstrate that stereo-matching algorithms using Belief Propagation (BP) produce the best results. The execution time of the BP algorithm in a CPU cannot satisfy real-time requirements with high-definition images. GPU-based implementations of BP algorithms are only able to work in real-time with small-medium size images because the traffic with memory limits their applicability. The inherent parallelism of the BP algorithm makes FPGA-based solutions a good choice. However, even though the memory traffic of a commercial FPGA-based ASIC-prototyping board is high, it is still not enough to comply with realtime, high definition and good immersive feeling requirements. The work presented estimates depth maps in less than 40 milliseconds for high-definition images at 30fps with 80 disparity levels. The proposed double BP topology and the new data-cost estimation improve the overall classical BP performance while they reduce the memory traffic by about 21%. Moreover, the adaptive message compression method and message distribution in memory reduce the number of memory accesses by more than 70% with an almost negligible loss of performance. The total memory traffic reduction is about 90%, demonstrating sufficient quality to be classified within the first 40 positions in the Middlebury ranking.This work has been partially supported by the CDTI under project CENIT-VISION 2007-1007 and the CICYT under TEC2008-04107

UCrea

Stereo Matching Using a Modified Efficient Belief Propagation in a Level Set Framework

Author: Rogers Stephen Goyer
Publication venue: The Aquila Digital Community
Publication date: 01/12/2010
Field of study

Stereo matching determines correspondence between pixels in two or more images of the same scene taken from different angles; this can be handled either locally or globally. The two most common global approaches are belief propagation (BP) and graph cuts. Efficient belief propagation (EBP), which is the most widely used BP approach, uses a multi-scale message passing strategy, an O(k) smoothness cost algorithm, and a bipartite message passing strategy to speed up the convergence of the standard BP approach. As in standard belief propagation, every pixel sends messages to and receives messages from its four neighboring pixels in EBP. Each outgoing message is the sum of the data cost, incoming messages from all the neighbors except the intended receiver, and the smoothness cost. Upon convergence, the location of the minimum of the final belief vector is defined as the current pixel’s disparity. The present effort makes three main contributions: (a) it incorporates level set concepts, (b) it develops a modified data cost to encourage matching of intervals, (c) it adjusts the location of the minimum of outgoing messages for select pixels that is consistent with the level set method. When comparing the results of the current work with that of standard EBP, the disparity results are very similar, as they should be

Aquila Digital Community (University of Southern Mississippi, USM)

Guidage et planification réactive de trajectoire d’un drone monoculaire contrôlé par intelligence artificielle

Author: Duperré Alexandre
Publication venue
Publication date: 01/08/2020
Field of study

RÉSUMÉ Le problème de guidage autonome est un domaine de recherche en constante évolution. La popularisation des drones a étendu ce domaine de recherche au cours des dernières années. La nature de ce type d’engins amène plusieurs nouveaux défis à surmonter, notamment en lien avec la variété d’environnements auxquels ils peuvent être confrontés. Contrairement aux voitures autonomes, les drones se retrouvent souvent dans des milieux inconnus non cartographiés et dépourvus de signal GPS. De nouvelles méthodes ont donc été développées pour mitiger ces défis. Les solutions au problème de guidage autonome dans la littérature peuvent dans ce mémoire de maîtrise être classées dans deux catégories : le guidage réactif localement à des fins d’exploration et le guidage orienté. La première catégorie regroupe les solutions de guidage local d’engins naviguant sans destination précise alors que la seconde regroupe celles de guidage tentant d’atteindre une destination. Les deux catégories de guidage en milieu inconnu utilisent majoritairement des approches incluant l’apprentissage par renforcement ainsi que l’apprentissage par imitation. Cependant, peu d’études abordent le problème de guidage orienté dans des environnements complexes de grandeur nature. L’objectif de ce projet de recherche est donc de concevoir un agent intelligent capable d’imiter la logique de guidage d’un humain dans un environnement inconnu complexe en se basant sur la vision de profondeur et une estimation de sa destination. Une approche utilisant l’apprentissage par imitation est employée pour minimiser les coûts et les temps de calcul. Un environnement de simulation sophistiqué est donc mis sur place afin de créer un ensemble de données pour l’entraînement par imitation. L’ensemble de données qui a été créé comporte 624 trajectoires parmi 9 environnements différents effectuées par un expert suboptimal pour un total de 296 466 paires d’entraînement. L’attributif suboptimal est employé pour qualifier l’humain à imiter puisque ce dernier devra dresser les trajets au meilleur de ses capacités sans avoir recours à des algorithmes de planification de trajectoire optimale. Un modèle de classification capable de prédire la prochaine commande de guidage à effectuer compte tenu des observations actuelles et précédentes a été implémenté. Le modèle est entraîné à encoder une représentation de l’image de profondeur obtenue à partir de l’image RGB ainsi qu’une représentation des coordonnées relative à sa destination. Ces représentations sont traitées par un réseau récurrent à mémoire court et long terme («Long Short-Term Memory» ou LSTM) ainsi qu’un perceptron multicouches («Multilayer Perceptron» ou MLP) afin de prédire la direction à emprunter. Une fonction coût adaptée au problème ainsi que des techniques d’augmentation de l’ensemble de données sont incorporées lors de l’entraînement afin d’améliorer la précision du modèle en validation et en test. Une recherche d’hyperparamètres de type grid search a été effectuée afin de sélectionner le meilleur modèle selon la précision obtenue sur l’ensemble de données de test. Des précisions entre 77.10% et 82.59% ont été atteintes indiquant un impact significatif des méthodes d’augmentation de l’ensemble de données.----------ABSTRACT The autonomous guidance field is a continuously evolving research topic. The popularization of micro aerial vehicles such as quadcopters has contributed to the expansion of this research topic. Because of the wide range of different environments they can navigate into, quadcopters have many challenges on their own. In contrast with autonomous cars, quadcopters will most likely navigate more often in unknown environments with limited or no GPS service. New methods for autonomous guidance were needed for quadcopters. The literature review reveals two main categories relevant to the autonomous guidance problem: locally passive-reactive guidance and oriented guidance. The former includes all forms of guidance not aiming for a specific target while the latter focuses on reaching a destination. Both categories are considering guidance in unknown environments and use mostly reinforcement learning or imitation learning as a solving method. However, most of the studies on autonomous oriented guidance are not executed in a full size, complex environment setting. The objective of this research project is to create an intelligent agent capable of imitating a human guidance policy in a complex and unknown environment based on a depth map image and relative goal inputs. Considering the lower cost in development and computation time, the imitation learning approach was chosen. A sophisticated simulation environment was set up to create an imitation learning datasets. A total of 624 suboptimal demonstration paths from 9 different 3D environments were gathered, which represent 296 466 learning pairs. The demonstrations are qualified as suboptimal since the expert is a human trying its best to solve the guidance problem without any optimal planners. A classification model was introduced for predicting the appropriate guidance command based on the observations over time. The model learned a meaningful representation of its inputs which can be processed by a long short-term memory network (LSTM) followed by a fully connected network. In this way, the depth image obtained from the RGB original image along with the relative coordinates to the destination are converted into a guidance command at each time step. In order to improve the classification accuracy on the test set, a custom loss function and data augmentation techniques were implemented. A grid search over possible combination of dataset augmentation proportions was conveyed to optimize the hyperparameters. Accuracy ranging between 77.10% and 82.59% were obtained for this experiment, revealing a significant dependency to the augmentation technique

PolyPublie

Efficient stereo matching and obstacle detection using edges in images from a moving vehicle

Author: Peña Carrillo Dexmont Alejandro
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2017
Field of study

Fast and robust obstacle detection is a crucial task for autonomous mobile robots. Current approaches for obstacle detection in autonomous cars are based on the use of LIDAR or computer vision. In this thesis computer vision is selected due to its low-power and passive nature. This thesis proposes the use of edges in images to reduce the required storage and processing. Most current approaches are based on dense maps, where all the pixels in the image are used, but this places a heavy load on the storage and processing capacity of the system. This makes dense approaches unsuitable for embedded systems, for which only limited amounts of memory and processing power are available. This motivates us to use sparse maps based on the edges in an image. Typically edge pixels represent a small percentage of the input image yet they are able to represent most of the image semantics. In this thesis two approaches for the use of edges to obtain disparity maps are proposed and one approach for identifying obstacles given edge-based disparities. The first approach proposes a modification to the Census Transform in order to incorporate a similarity measure. This similarity measure behaves as a threshold on the gradient, resulting in the identification of high gradient areas. The identification of these high gradient areas helps to reduce the search space in an area-based stereo-matching approach. Additionally, the Complete Rank Transform is evaluated for the first time in the context of stereo-matching. An area-based local stereo-matching approach is used to evaluate and compare the performance of these pixel descriptors. The second approach proposes a new approach for the computation of edge-disparities. Instead of first detecting the edges and then reducing the search space, the proposed approach detects the edges and computes the disparities at the same time. The approach extends the fast and robust Edge Drawing edge detector to run simultaneously across the stereo pair. By doing this the number of matched pixels and the required operations are reduced as the descriptors and costs are only computed for a fraction of the edge pixels (anchor points). Then the image gradient is used to propagate the disparities from the matched anchor points along the gradients, resulting in one-voxel wide chains of 3D points with connectivity information. The third proposed algorithm takes as input edge-based disparity maps which are compact and yet retain the semantic representation of the captured scene. This approach estimates the ground plane, clusters the edges into individual obstacles and then computes the image stixels which allow the identification of the free and occupied space in the captured stereo-views. Previous approaches for the computation of stixels use dense disparity maps or occupancy grids. Moreover they are unable to identify more than one stixel per column, whereas our approach can. This means that it can identify partially occluded objects. The proposed approach is tested on a public-domain dataset. Results for accuracy and performance are presented. The obtained results show that by using image edges it is possible to reduce the required processing and storage while obtaining accuracies comparable to those obtained by dense approaches

DCU Online Research Access Service