12 research outputs found

    Evaluation of CNN-based Single-Image Depth Estimation Methods

    Get PDF
    While an increasing interest in deep models for single-image depth estimation methods can be observed, established schemes for their evaluation are still limited. We propose a set of novel quality criteria, allowing for a more detailed analysis by focusing on specific characteristics of depth maps. In particular, we address the preservation of edges and planar regions, depth consistency, and absolute distance accuracy. In order to employ these metrics to evaluate and compare state-of-the-art single-image depth estimation approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera together with a laser scanner to acquire high-resolution images and highly accurate depth maps. Experimental results show the validity of our proposed evaluation protocol

    SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments

    Full text link
    Different environments pose a great challenge to the outdoor robust visual perception for long-term autonomous driving and the generalization of learning-based algorithms on different environmental effects is still an open problem. Although monocular depth prediction has been well studied recently, there is few work focusing on the robust learning-based depth prediction across different environments, e.g. changing illumination and seasons, owing to the lack of such a multi-environment real-world dataset and benchmark. To this end, the first cross-season monocular depth prediction dataset and benchmark SeasonDepth is built based on CMU Visual Localization dataset. To benchmark the depth estimation performance under different environments, we investigate representative and recent state-of-the-art open-source supervised, self-supervised and domain adaptation depth prediction methods from KITTI benchmark using several newly-formulated metrics. Through extensive experimental evaluation on the proposed dataset, the influence of multiple environments on performance and robustness is analyzed qualitatively and quantitatively, showing that the long-term monocular depth prediction is still challenging even with fine-tuning. We further give promising avenues that self-supervised training and stereo geometry constraint help to enhance the robustness to changing environments. The dataset is available on https://seasondepth.github.io, and benchmark toolkit is available on https://github.com/SeasonDepth/SeasonDepth.Comment: 19 pages, 13 figure

    Partially Supervised Multi-Task Network for Single-View Dietary Assessment

    Full text link
    Food volume estimation is an essential step in the pipeline of dietary assessment and demands the precise depth estimation of the food surface and table plane. Existing methods based on computer vision require either multi-image input or additional depth maps, reducing convenience of implementation and practical significance. Despite the recent advances in unsupervised depth estimation from a single image, the achieved performance in the case of large texture-less areas needs to be improved. In this paper, we propose a network architecture that jointly performs geometric understanding (i.e., depth prediction and 3D plane estimation) and semantic prediction on a single food image, enabling a robust and accurate food volume estimation regardless of the texture characteristics of the target plane. For the training of the network, only monocular videos with semantic ground truth are required, while the depth map and 3D plane ground truth are no longer needed. Experimental results on two separate food image databases demonstrate that our method performs robustly on texture-less scenarios and is superior to unsupervised networks and structure from motion based approaches, while it achieves comparable performance to fully-supervised methods

    Self-supervised monocular depth estimation from oblique UAV videos

    Get PDF
    UAVs have become an essential photogrammetric measurement as they are affordable, easily accessible and versatile. Aerial images captured from UAVs have applications in small and large scale texture mapping, 3D modelling, object detection tasks, DTM and DSM generation etc. Photogrammetric techniques are routinely used for 3D reconstruction from UAV images where multiple images of the same scene are acquired. Developments in computer vision and deep learning techniques have made Single Image Depth Estimation (SIDE) a field of intense research. Using SIDE techniques on UAV images can overcome the need for multiple images for 3D reconstruction. This paper aims to estimate depth from a single UAV aerial image using deep learning. We follow a self-supervised learning approach, Self-Supervised Monocular Depth Estimation (SMDE), which does not need ground truth depth or any extra information other than images for learning to estimate depth. Monocular video frames are used for training the deep learning model which learns depth and pose information jointly through two different networks, one each for depth and pose. The predicted depth and pose are used to reconstruct one image from the viewpoint of another image utilising the temporal information from videos. We propose a novel architecture with two 2D CNN encoders and a 3D CNN decoder for extracting information from consecutive temporal frames. A contrastive loss term is introduced for improving the quality of image generation. Our experiments are carried out on the public UAVid video dataset. The experimental results demonstrate that our model outperforms the state-of-the-art methods in estimating the depths.Comment: Submitted to ISPRS Journal of Photogrammetry and Remote Sensin

    G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data

    Full text link
    Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This paper investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.Comment: 18 pages, 16 figure

    Non-Contact Height Estimation for Material Extrusion Additive Systems via Monocular Imagery

    Get PDF
    Additive manufacturing is a dynamic technology with a compelling potential to advance the manufacturing industry. Despite its capacity to produce intricate designs in an efficient manner, industry still has not widely adopted additive manufacturing since its commercialization as a result of its many challenges related to quality control. The Air Force Research Laboratory (AFRL), Materials and Manufacturing Directorate, Functional Materials Division, Soft Matter Materials Branch (RXAS) requires a practical and reliable method for maintaining quality control for the production of printed flexible electronics. Height estimation is a crucial component for maintaining quality control in Material Extrusion Additive Manufacturing (MEAM), as the fundamental process for constructing any structure relies on the consecutive layering of precise extrusions. This work presents a computer vision solution to the problem of height estimation using monocular imagery as applicable to MEAM

    ATHENA Research Book

    Get PDF
    The ATHENA European University is an alliance of nine Higher Education Institutions with the mission of fostering excellence in research and innovation by facilitating international cooperation. The ATHENA acronym stands for Advanced Technologies in Higher Education Alliance. The partner institutions are from France, Germany, Greece, Italy, Lithuania, Portugal, and Slovenia: the University of Orléans, the University of Siegen, the Hellenic Mediterranean University, the Niccolò Cusano University, the Vilnius Gediminas Technical University, the Polytechnic Institute of Porto, and the University of Maribor. In 2022 institutions from Poland and Spain joined the alliance: the Maria Curie-Skłodowska University and the University of Vigo. This research book presents a selection of the ATHENA university partners' research activities. It incorporates peer-reviewed original articles, reprints and student contributions. The ATHENA Research Book provides a platform that promotes joint and interdisciplinary research projects of both advanced and early-career researchers
    corecore