17 research outputs found

    Metrically Scaled Monocular Depth Estimation through Sparse Priors for Underwater Robots

    Full text link
    In this work, we address the problem of real-time dense depth estimation from monocular images for mobile underwater vehicles. We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions and solve the problem of scale ambiguity. To allow prior inputs of arbitrary sparsity, we apply a dense parameterization method. Our model extends recent state-of-the-art approaches to monocular image based depth estimation, using an efficient encoder-decoder backbone and modern lightweight transformer optimization stage to encode global context. The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea. Evaluation results on this dataset demonstrate significant improvement in depth prediction accuracy by the fusion of the sparse feature priors. In addition, without any retraining, our method achieves similar depth prediction accuracy on a downward looking dataset we collected with a diver operated camera rig, conducting a survey of a coral reef. The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core and is suitable for direct deployment on embedded systems. The implementation of this work is made publicly available at https://github.com/ebnerluca/uw_depth.Comment: Submitted to ICRA 202

    IEEE Access Special Section Editorial: Big Data Learning and Discovery

    Full text link

    Mapping and Deep Analysis of Image Dehazing: Coherent Taxonomy, Datasets, Open Challenges, Motivations, and Recommendations

    Get PDF
    Our study aims to review and analyze the most relevant studies in the image dehazing field. Many aspects have been deemed necessary to provide a broad understanding of various studies that have been examined through surveying the existing literature. These aspects are as follows: datasets that have been used in the literature, challenges that other researchers have faced, motivations, and recommendations for diminishing the obstacles in the reported literature. A systematic protocol is employed to search all relevant articles on image dehazing, with variations in keywords, in addition to searching for evaluation and benchmark studies. The search process is established on three online databases, namely, IEEE Xplore, Web of Science (WOS), and ScienceDirect (SD), from 2008 to 2021. These indices are selected because they are sufficient in terms of coverage. Along with definition of the inclusion and exclusion criteria, we include 152 articles to the final set. A total of 55 out of 152 articles focused on various studies that conducted image dehazing, and 13 out 152 studies covered most of the review papers based on scenarios and general overviews. Finally, most of the included articles centered on the development of image dehazing algorithms based on real-time scenario (84/152) articles. Image dehazing removes unwanted visual effects and is often considered an image enhancement technique, which requires a fully automated algorithm to work under real-time outdoor applications, a reliable evaluation method, and datasets based on different weather conditions. Many relevant studies have been conducted to meet these critical requirements. We conducted objective image quality assessment experimental comparison of various image dehazing algorithms. In conclusions unlike other review papers, our study distinctly reflects different observations on image dehazing areas. We believe that the result of this study can serve as a useful guideline for practitioners who are looking for a comprehensive view on image dehazing

    Fusion of computed point clouds and integral-imaging concepts for full-parallax 3D display

    Get PDF
    During the last century, various technologies of 3D image capturing and visualization have spotlighted, due to both their pioneering nature and the aspiration to extend the applications of conventional 2D imaging technology to 3D scenes. Besides, thanks to advances in opto-electronic imaging technologies, the possibilities of capturing and transmitting 2D images in real-time have progressed significantly, and boosted the growth of 3D image capturing, processing, transmission and as well as display techniques. Among the latter, integral-imaging technology has been considered as one of the promising ones to restore real 3D scenes through the use of a multi-view visualization system that provides to observers with a sense of immersive depth. Many research groups and companies have researched this novel technique with different approaches, and occasions for various complements. In this work, we followed this trend, but processed through our novel strategies and algorithms. Thus, we may say that our approach is innovative, when compared to conventional proposals. The main objective of our research is to develop techniques that allow recording and simulating the natural scene in 3D by using several cameras which have different types and characteristics. Then, we compose a dense 3D scene from the computed 3D data by using various methods and techniques. Finally, we provide a volumetric scene which is restored with great similarity to the original shape, through a comprehensive 3D monitor and/or display system. Our Proposed integral-imaging monitor shows an immersive experience to multiple observers. In this thesis we address the challenges of integral image production techniques based on the computerized 3D information, and we focus in particular on the implementation of full-parallax 3D display system. We have also made progress in overcoming the limitations of the conventional integral-imaging technique. In addition, we have developed different refinement methodologies and restoration strategies for the composed depth information. Finally, we have applied an adequate solution that reduces the computation times significantly, associated with the repetitive calculation phase in the generation of an integral image. All these results are presented by the corresponding images and proposed display experiments

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Probabilistic modeling for single-photon lidar

    Full text link
    Lidar is an increasingly prevalent technology for depth sensing, with applications including scientific measurement and autonomous navigation systems. While conventional systems require hundreds or thousands of photon detections per pixel to form accurate depth and reflectivity images, recent results for single-photon lidar (SPL) systems using single-photon avalanche diode (SPAD) detectors have shown accurate images formed from as little as one photon detection per pixel, even when half of those detections are due to uninformative ambient light. The keys to such photon-efficient image formation are two-fold: (i) a precise model of the probability distribution of photon detection times, and (ii) prior beliefs about the structure of natural scenes. Reducing the number of photons needed for accurate image formation enables faster, farther, and safer acquisition. Still, such photon-efficient systems are often limited to laboratory conditions more favorable than the real-world settings in which they would be deployed. This thesis focuses on expanding the photon detection time models to address challenging imaging scenarios and the effects of non-ideal acquisition equipment. The processing derived from these enhanced models, sometimes modified jointly with the acquisition hardware, surpasses the performance of state-of-the-art photon counting systems. We first address the problem of high levels of ambient light, which causes traditional depth and reflectivity estimators to fail. We achieve robustness to strong ambient light through a rigorously derived window-based censoring method that separates signal and background light detections. Spatial correlations both within and between depth and reflectivity images are encoded in superpixel constructions, which fill in holes caused by the censoring. Accurate depth and reflectivity images can then be formed with an average of 2 signal photons and 50 background photons per pixel, outperforming methods previously demonstrated at a signal-to-background ratio of 1. We next approach the problem of coarse temporal resolution for photon detection time measurements, which limits the precision of depth estimates. To achieve sub-bin depth precision, we propose a subtractively-dithered lidar implementation, which uses changing synchronization delays to shift the time-quantization bin edges. We examine the generic noise model resulting from dithering Gaussian-distributed signals and introduce a generalized Gaussian approximation to the noise distribution and simple order statistics-based depth estimators that take advantage of this model. Additional analysis of the generalized Gaussian approximation yields rules of thumb for determining when and how to apply dither to quantized measurements. We implement a dithered SPL system and propose a modification for non-Gaussian pulse shapes that outperforms the Gaussian assumption in practical experiments. The resulting dithered-lidar architecture could be used to design SPAD array detectors that can form precise depth estimates despite relaxed temporal quantization constraints. Finally, SPAD dead time effects have been considered a major limitation for fast data acquisition in SPL, since a commonly adopted approach for dead time mitigation is to operate in the low-flux regime where dead time effects can be ignored. We show that the empirical distribution of detection times converges to the stationary distribution of a Markov chain and demonstrate improvements in depth estimation and histogram correction using our Markov chain model. An example simulation shows that correctly compensating for dead times in a high-flux measurement can yield a 20-times speed up of data acquisition. The resulting accuracy at high photon flux could enable real-time applications such as autonomous navigation

    Physics-based Reconstruction and Animation of Humans

    Get PDF
    Creating digital representations of humans is of utmost importance for applications ranging from entertainment (video games, movies) to human-computer interaction and even psychiatrical treatments. What makes building credible digital doubles difficult is the fact that the human vision system is very sensitive to perceiving the complex expressivity and potential anomalies in body structures and motion. This thesis will present several projects that tackle these problems from two different perspectives: lightweight acquisition and physics-based simulation. It starts by describing a complete pipeline that allows users to reconstruct fully rigged 3D facial avatars using video data coming from a handheld device (e.g., smartphone). The avatars use a novel two-scale representation composed of blendshapes and dynamic detail maps. They are constructed through an optimization that integrates feature tracking, optical flow, and shape from shading. Continuing along the lines of accessible acquisition systems, we discuss a framework for simultaneous tracking and modeling of articulated human bodies from RGB-D data. We show how semantic information can be extracted from the scanned body shapes. In the second half of the thesis, we will deviate from using standard linear reconstruction and animation models, and rather focus on exploiting physics-based techniques that are able to incorporate complex phenomena such as dynamics, collision response and incompressibility of the materials. The first approach we propose assumes that each 3D scan of an actor records his body in a physical steady state and uses a process called inverse physics to extract a volumetric physics-ready anatomical model of him. By using biologically-inspired growth models for the bones, muscles and fat, our method can obtain realistic anatomical reconstructions that can be later on animated using external tracking data such as the one resulting from tracking motion capture markers. This is then extended to a novel physics-based approach for facial reconstruction and animation. We propose a facial animation model which simulates biomechanical muscle contractions in a volumetric head model in order to create the facial expressions seen in the input scans. We then show how this approach allows for new avenues of dynamic artistic control, simulation of corrective facial surgery, and interaction with external forces and objects
    corecore