760 research outputs found

    Dense Piecewise Planar RGB-D SLAM for Indoor Environments

    Full text link
    The paper exploits weak Manhattan constraints to parse the structure of indoor environments from RGB-D video sequences in an online setting. We extend the previous approach for single view parsing of indoor scenes to video sequences and formulate the problem of recovering the floor plan of the environment as an optimal labeling problem solved using dynamic programming. The temporal continuity is enforced in a recursive setting, where labeling from previous frames is used as a prior term in the objective function. In addition to recovery of piecewise planar weak Manhattan structure of the extended environment, the orthogonality constraints are also exploited by visual odometry and pose graph optimization. This yields reliable estimates in the presence of large motions and absence of distinctive features to track. We evaluate our method on several challenging indoors sequences demonstrating accurate SLAM and dense mapping of low texture environments. On existing TUM benchmark we achieve competitive results with the alternative approaches which fail in our environments.Comment: International Conference on Intelligent Robots and Systems (IROS) 201

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    Advanced Restoration Techniques for Images and Disparity Maps

    Get PDF
    With increasing popularity of digital cameras, the field of Computa- tional Photography emerges as one of the most demanding areas of research. In this thesis we study and develop novel priors and op- timization techniques to solve inverse problems, including disparity estimation and image restoration. The disparity map estimation method proposed in this thesis incor- porates multiple frames of a stereo video sequence to ensure temporal coherency. To enforce smoothness, we use spatio-temporal connec- tions between the pixels of the disparity map to constrain our solution. Apart from smoothness, we enforce a consistency constraint for the disparity assignments by using connections between the left and right views. These constraints are then formulated in a graphical model, which we solve using mean-field approximation. We use a filter-based mean-field optimization that perform efficiently by updating the dis- parity variables in parallel. The parallel updates scheme, however, is not guaranteed to converge to a stationary point. To compare and demonstrate the effectiveness of our approach, we developed a new optimization technique that uses sequential updates, which runs ef- ficiently and guarantees convergence. Our empirical results indicate that with proper initialization, we can employ the parallel update scheme and efficiently optimize our disparity maps without loss of quality. Our method ranks amongst the state of the art in common benchmarks, and significantly reduces the temporal flickering artifacts in the disparity maps. In the second part of this thesis, we address several image restora- tion problems such as image deblurring, demosaicing and super- resolution. We propose to use denoising autoencoders to learn an approximation of the true natural image distribution. We parametrize our denoisers using deep neural networks and show that they learn the gradient of the smoothed density of natural images. Based on this analysis, we propose a restoration technique that moves the so- lution towards the local extrema of this distribution by minimizing the difference between the input and output of our denoiser. Weii demonstrate the effectiveness of our approach using a single trained neural network in several restoration tasks such as deblurring and super-resolution. In a more general framework, we define a new Bayes formulation for the restoration problem, which leads to a more efficient and robust estimator. The proposed framework achieves state of the art performance in various restoration tasks such as deblurring and demosaicing, and also for more challenging tasks such as noise- and kernel-blind image deblurring. Keywords. disparity map estimation, stereo matching, mean-field optimization, graphical models, image processing, linear inverse prob- lems, image restoration, image deblurring, image denoising, single image super-resolution, image demosaicing, deep neural networks, denoising autoencoder

    Recursive Inference for Prediction of Objects in Urban Environments

    Get PDF
    Abstract Future advancements in robotic navigation and mapping rest to a large extent on robust, efficient and more advanced semantic understanding of the surrounding environment. The existing semantic mapping approaches typically consider small number of semantic categories, require complex inference or large number of training examples to achieve desirable performance. In the proposed work we present an efficient approach for predicting locations of generic objects in urban environments by means of semantic segmentation of a video into object and nonobject categories. We exploit widely available exemplars of non-object categories (such as road, buildings, vegetation) and use geometric cues which are indicative of the presence of object boundaries to gather the evidence about objects regardless of their category. We formulate the object/non-object semantic segmentation problem in the Conditional Random Field framework, where the structure of the graph is induced by a minimum spanning tree computed over a 3D point cloud, yielding an efficient algorithm for an exact inference. The chosen 3D representation naturally lends itself for on-line recursive belief updates with a simple soft data association mechanism. We carry out extensive experiments on videos of urban environments acquired by a moving vehicle and show quantitatively and qualitatively the benefits of our proposal.

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Coherent spatial and temporal occlusion generation

    Full text link
    corecore