4 research outputs found

    Memory-efficient belief propagation for high-definition real-time stereo matching systems

    Get PDF
    Tele-presence systems will enable participants to feel like they are physically together. In order to improve this feeling, these systems are starting to include depth estimation capabilities. A typical requirement for these systems includes high definition, good quality results and low latency. Benchmarks demonstrate that stereo-matching algorithms using Belief Propagation (BP) produce the best results. The execution time of the BP algorithm in a CPU cannot satisfy real-time requirements with high-definition images. GPU-based implementations of BP algorithms are only able to work in real-time with small-medium size images because the traffic with memory limits their applicability. The inherent parallelism of the BP algorithm makes FPGA-based solutions a good choice. However, even though the memory traffic of a commercial FPGA-based ASIC-prototyping board is high, it is still not enough to comply with realtime, high definition and good immersive feeling requirements. The work presented estimates depth maps in less than 40 milliseconds for high-definition images at 30fps with 80 disparity levels. The proposed double BP topology and the new data-cost estimation improve the overall classical BP performance while they reduce the memory traffic by about 21%. Moreover, the adaptive message compression method and message distribution in memory reduce the number of memory accesses by more than 70% with an almost negligible loss of performance. The total memory traffic reduction is about 90%, demonstrating sufficient quality to be classified within the first 40 positions in the Middlebury ranking.This work has been partially supported by the CDTI under project CENIT-VISION 2007-1007 and the CICYT under TEC2008-04107

    Stereo Matching Using a Modified Efficient Belief Propagation in a Level Set Framework

    Get PDF
    Stereo matching determines correspondence between pixels in two or more images of the same scene taken from different angles; this can be handled either locally or globally. The two most common global approaches are belief propagation (BP) and graph cuts. Efficient belief propagation (EBP), which is the most widely used BP approach, uses a multi-scale message passing strategy, an O(k) smoothness cost algorithm, and a bipartite message passing strategy to speed up the convergence of the standard BP approach. As in standard belief propagation, every pixel sends messages to and receives messages from its four neighboring pixels in EBP. Each outgoing message is the sum of the data cost, incoming messages from all the neighbors except the intended receiver, and the smoothness cost. Upon convergence, the location of the minimum of the final belief vector is defined as the current pixel’s disparity. The present effort makes three main contributions: (a) it incorporates level set concepts, (b) it develops a modified data cost to encourage matching of intervals, (c) it adjusts the location of the minimum of outgoing messages for select pixels that is consistent with the level set method. When comparing the results of the current work with that of standard EBP, the disparity results are very similar, as they should be

    Guidage et planification rĂ©active de trajectoire d’un drone monoculaire contrĂŽlĂ© par intelligence artificielle

    Get PDF
    RÉSUMÉ Le problĂšme de guidage autonome est un domaine de recherche en constante Ă©volution. La popularisation des drones a Ă©tendu ce domaine de recherche au cours des derniĂšres annĂ©es. La nature de ce type d’engins amĂšne plusieurs nouveaux dĂ©fis Ă  surmonter, notamment en lien avec la variĂ©tĂ© d’environnements auxquels ils peuvent ĂȘtre confrontĂ©s. Contrairement aux voitures autonomes, les drones se retrouvent souvent dans des milieux inconnus non cartographiĂ©s et dĂ©pourvus de signal GPS. De nouvelles mĂ©thodes ont donc Ă©tĂ© dĂ©veloppĂ©es pour mitiger ces dĂ©fis. Les solutions au problĂšme de guidage autonome dans la littĂ©rature peuvent dans ce mĂ©moire de maĂźtrise ĂȘtre classĂ©es dans deux catĂ©gories : le guidage rĂ©actif localement Ă  des fins d’exploration et le guidage orientĂ©. La premiĂšre catĂ©gorie regroupe les solutions de guidage local d’engins naviguant sans destination prĂ©cise alors que la seconde regroupe celles de guidage tentant d’atteindre une destination. Les deux catĂ©gories de guidage en milieu inconnu utilisent majoritairement des approches incluant l’apprentissage par renforcement ainsi que l’apprentissage par imitation. Cependant, peu d’études abordent le problĂšme de guidage orientĂ© dans des environnements complexes de grandeur nature. L’objectif de ce projet de recherche est donc de concevoir un agent intelligent capable d’imiter la logique de guidage d’un humain dans un environnement inconnu complexe en se basant sur la vision de profondeur et une estimation de sa destination. Une approche utilisant l’apprentissage par imitation est employĂ©e pour minimiser les coĂ»ts et les temps de calcul. Un environnement de simulation sophistiquĂ© est donc mis sur place afin de crĂ©er un ensemble de donnĂ©es pour l’entraĂźnement par imitation. L’ensemble de donnĂ©es qui a Ă©tĂ© crĂ©Ă© comporte 624 trajectoires parmi 9 environnements diffĂ©rents effectuĂ©es par un expert suboptimal pour un total de 296 466 paires d’entraĂźnement. L’attributif suboptimal est employĂ© pour qualifier l’humain Ă  imiter puisque ce dernier devra dresser les trajets au meilleur de ses capacitĂ©s sans avoir recours Ă  des algorithmes de planification de trajectoire optimale. Un modĂšle de classification capable de prĂ©dire la prochaine commande de guidage Ă  effectuer compte tenu des observations actuelles et prĂ©cĂ©dentes a Ă©tĂ© implĂ©mentĂ©. Le modĂšle est entraĂźnĂ© Ă  encoder une reprĂ©sentation de l’image de profondeur obtenue Ă  partir de l’image RGB ainsi qu’une reprĂ©sentation des coordonnĂ©es relative Ă  sa destination. Ces reprĂ©sentations sont traitĂ©es par un rĂ©seau rĂ©current Ă  mĂ©moire court et long terme («Long Short-Term Memory» ou LSTM) ainsi qu’un perceptron multicouches («Multilayer Perceptron» ou MLP) afin de prĂ©dire la direction Ă  emprunter. Une fonction coĂ»t adaptĂ©e au problĂšme ainsi que des techniques d’augmentation de l’ensemble de donnĂ©es sont incorporĂ©es lors de l’entraĂźnement afin d’amĂ©liorer la prĂ©cision du modĂšle en validation et en test. Une recherche d’hyperparamĂštres de type grid search a Ă©tĂ© effectuĂ©e afin de sĂ©lectionner le meilleur modĂšle selon la prĂ©cision obtenue sur l’ensemble de donnĂ©es de test. Des prĂ©cisions entre 77.10% et 82.59% ont Ă©tĂ© atteintes indiquant un impact significatif des mĂ©thodes d’augmentation de l’ensemble de donnĂ©es.----------ABSTRACT The autonomous guidance field is a continuously evolving research topic. The popularization of micro aerial vehicles such as quadcopters has contributed to the expansion of this research topic. Because of the wide range of different environments they can navigate into, quadcopters have many challenges on their own. In contrast with autonomous cars, quadcopters will most likely navigate more often in unknown environments with limited or no GPS service. New methods for autonomous guidance were needed for quadcopters. The literature review reveals two main categories relevant to the autonomous guidance problem: locally passive-reactive guidance and oriented guidance. The former includes all forms of guidance not aiming for a specific target while the latter focuses on reaching a destination. Both categories are considering guidance in unknown environments and use mostly reinforcement learning or imitation learning as a solving method. However, most of the studies on autonomous oriented guidance are not executed in a full size, complex environment setting. The objective of this research project is to create an intelligent agent capable of imitating a human guidance policy in a complex and unknown environment based on a depth map image and relative goal inputs. Considering the lower cost in development and computation time, the imitation learning approach was chosen. A sophisticated simulation environment was set up to create an imitation learning datasets. A total of 624 suboptimal demonstration paths from 9 different 3D environments were gathered, which represent 296 466 learning pairs. The demonstrations are qualified as suboptimal since the expert is a human trying its best to solve the guidance problem without any optimal planners. A classification model was introduced for predicting the appropriate guidance command based on the observations over time. The model learned a meaningful representation of its inputs which can be processed by a long short-term memory network (LSTM) followed by a fully connected network. In this way, the depth image obtained from the RGB original image along with the relative coordinates to the destination are converted into a guidance command at each time step. In order to improve the classification accuracy on the test set, a custom loss function and data augmentation techniques were implemented. A grid search over possible combination of dataset augmentation proportions was conveyed to optimize the hyperparameters. Accuracy ranging between 77.10% and 82.59% were obtained for this experiment, revealing a significant dependency to the augmentation technique

    Efficient stereo matching and obstacle detection using edges in images from a moving vehicle

    Get PDF
    Fast and robust obstacle detection is a crucial task for autonomous mobile robots. Current approaches for obstacle detection in autonomous cars are based on the use of LIDAR or computer vision. In this thesis computer vision is selected due to its low-power and passive nature. This thesis proposes the use of edges in images to reduce the required storage and processing. Most current approaches are based on dense maps, where all the pixels in the image are used, but this places a heavy load on the storage and processing capacity of the system. This makes dense approaches unsuitable for embedded systems, for which only limited amounts of memory and processing power are available. This motivates us to use sparse maps based on the edges in an image. Typically edge pixels represent a small percentage of the input image yet they are able to represent most of the image semantics. In this thesis two approaches for the use of edges to obtain disparity maps are proposed and one approach for identifying obstacles given edge-based disparities. The first approach proposes a modification to the Census Transform in order to incorporate a similarity measure. This similarity measure behaves as a threshold on the gradient, resulting in the identification of high gradient areas. The identification of these high gradient areas helps to reduce the search space in an area-based stereo-matching approach. Additionally, the Complete Rank Transform is evaluated for the first time in the context of stereo-matching. An area-based local stereo-matching approach is used to evaluate and compare the performance of these pixel descriptors. The second approach proposes a new approach for the computation of edge-disparities. Instead of first detecting the edges and then reducing the search space, the proposed approach detects the edges and computes the disparities at the same time. The approach extends the fast and robust Edge Drawing edge detector to run simultaneously across the stereo pair. By doing this the number of matched pixels and the required operations are reduced as the descriptors and costs are only computed for a fraction of the edge pixels (anchor points). Then the image gradient is used to propagate the disparities from the matched anchor points along the gradients, resulting in one-voxel wide chains of 3D points with connectivity information. The third proposed algorithm takes as input edge-based disparity maps which are compact and yet retain the semantic representation of the captured scene. This approach estimates the ground plane, clusters the edges into individual obstacles and then computes the image stixels which allow the identification of the free and occupied space in the captured stereo-views. Previous approaches for the computation of stixels use dense disparity maps or occupancy grids. Moreover they are unable to identify more than one stixel per column, whereas our approach can. This means that it can identify partially occluded objects. The proposed approach is tested on a public-domain dataset. Results for accuracy and performance are presented. The obtained results show that by using image edges it is possible to reduce the required processing and storage while obtaining accuracies comparable to those obtained by dense approaches
    corecore