12 research outputs found
Evaluation of CNN-based Single-Image Depth Estimation Methods
While an increasing interest in deep models for single-image depth estimation
methods can be observed, established schemes for their evaluation are still
limited. We propose a set of novel quality criteria, allowing for a more
detailed analysis by focusing on specific characteristics of depth maps. In
particular, we address the preservation of edges and planar regions, depth
consistency, and absolute distance accuracy. In order to employ these metrics
to evaluate and compare state-of-the-art single-image depth estimation
approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera
together with a laser scanner to acquire high-resolution images and highly
accurate depth maps. Experimental results show the validity of our proposed
evaluation protocol
SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments
Different environments pose a great challenge to the outdoor robust visual
perception for long-term autonomous driving and the generalization of
learning-based algorithms on different environmental effects is still an open
problem. Although monocular depth prediction has been well studied recently,
there is few work focusing on the robust learning-based depth prediction across
different environments, e.g. changing illumination and seasons, owing to the
lack of such a multi-environment real-world dataset and benchmark. To this end,
the first cross-season monocular depth prediction dataset and benchmark
SeasonDepth is built based on CMU Visual Localization dataset. To benchmark the
depth estimation performance under different environments, we investigate
representative and recent state-of-the-art open-source supervised,
self-supervised and domain adaptation depth prediction methods from KITTI
benchmark using several newly-formulated metrics. Through extensive
experimental evaluation on the proposed dataset, the influence of multiple
environments on performance and robustness is analyzed qualitatively and
quantitatively, showing that the long-term monocular depth prediction is still
challenging even with fine-tuning. We further give promising avenues that
self-supervised training and stereo geometry constraint help to enhance the
robustness to changing environments. The dataset is available on
https://seasondepth.github.io, and benchmark toolkit is available on
https://github.com/SeasonDepth/SeasonDepth.Comment: 19 pages, 13 figure
Partially Supervised Multi-Task Network for Single-View Dietary Assessment
Food volume estimation is an essential step in the pipeline of dietary
assessment and demands the precise depth estimation of the food surface and
table plane. Existing methods based on computer vision require either
multi-image input or additional depth maps, reducing convenience of
implementation and practical significance. Despite the recent advances in
unsupervised depth estimation from a single image, the achieved performance in
the case of large texture-less areas needs to be improved. In this paper, we
propose a network architecture that jointly performs geometric understanding
(i.e., depth prediction and 3D plane estimation) and semantic prediction on a
single food image, enabling a robust and accurate food volume estimation
regardless of the texture characteristics of the target plane. For the training
of the network, only monocular videos with semantic ground truth are required,
while the depth map and 3D plane ground truth are no longer needed.
Experimental results on two separate food image databases demonstrate that our
method performs robustly on texture-less scenarios and is superior to
unsupervised networks and structure from motion based approaches, while it
achieves comparable performance to fully-supervised methods
Self-supervised monocular depth estimation from oblique UAV videos
UAVs have become an essential photogrammetric measurement as they are
affordable, easily accessible and versatile. Aerial images captured from UAVs
have applications in small and large scale texture mapping, 3D modelling,
object detection tasks, DTM and DSM generation etc. Photogrammetric techniques
are routinely used for 3D reconstruction from UAV images where multiple images
of the same scene are acquired. Developments in computer vision and deep
learning techniques have made Single Image Depth Estimation (SIDE) a field of
intense research. Using SIDE techniques on UAV images can overcome the need for
multiple images for 3D reconstruction. This paper aims to estimate depth from a
single UAV aerial image using deep learning. We follow a self-supervised
learning approach, Self-Supervised Monocular Depth Estimation (SMDE), which
does not need ground truth depth or any extra information other than images for
learning to estimate depth. Monocular video frames are used for training the
deep learning model which learns depth and pose information jointly through two
different networks, one each for depth and pose. The predicted depth and pose
are used to reconstruct one image from the viewpoint of another image utilising
the temporal information from videos. We propose a novel architecture with two
2D CNN encoders and a 3D CNN decoder for extracting information from
consecutive temporal frames. A contrastive loss term is introduced for
improving the quality of image generation. Our experiments are carried out on
the public UAVid video dataset. The experimental results demonstrate that our
model outperforms the state-of-the-art methods in estimating the depths.Comment: Submitted to ISPRS Journal of Photogrammetry and Remote Sensin
G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data
Monocular depth inference is a fundamental problem for scene perception of
robots. Specific robots may be equipped with a camera plus an optional depth
sensor of any type and located in various scenes of different scales, whereas
recent advances derived multiple individual sub-tasks. It leads to additional
burdens to fine-tune models for specific robots and thereby high-cost
customization in large-scale industrialization. This paper investigates a
unified task of monocular depth inference, which infers high-quality depth maps
from all kinds of input raw data from various robots in unseen scenes. A basic
benchmark G2-MonoDepth is developed for this task, which comprises four
components: (a) a unified data representation RGB+X to accommodate RGB plus raw
depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and
errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth
sparsity/errors of input raw data and diverse scales of output scenes, (c) an
improved network to well propagate diverse scene scales from input to output,
and (d) a data augmentation pipeline to simulate all types of real artifacts in
raw depth maps for training. G2-MonoDepth is applied in three sub-tasks
including depth estimation, depth completion with different sparsity, and depth
enhancement in unseen scenes, and it always outperforms SOTA baselines on both
real-world data and synthetic data.Comment: 18 pages, 16 figure
Non-Contact Height Estimation for Material Extrusion Additive Systems via Monocular Imagery
Additive manufacturing is a dynamic technology with a compelling potential to advance the manufacturing industry. Despite its capacity to produce intricate designs in an efficient manner, industry still has not widely adopted additive manufacturing since its commercialization as a result of its many challenges related to quality control. The Air Force Research Laboratory (AFRL), Materials and Manufacturing Directorate, Functional Materials Division, Soft Matter Materials Branch (RXAS) requires a practical and reliable method for maintaining quality control for the production of printed flexible electronics. Height estimation is a crucial component for maintaining quality control in Material Extrusion Additive Manufacturing (MEAM), as the fundamental process for constructing any structure relies on the consecutive layering of precise extrusions. This work presents a computer vision solution to the problem of height estimation using monocular imagery as applicable to MEAM
ATHENA Research Book
The ATHENA European University is an alliance of nine Higher Education Institutions with the mission of fostering excellence in research and innovation by facilitating international cooperation. The ATHENA acronym stands for Advanced Technologies in Higher Education Alliance. The partner institutions are from France, Germany, Greece, Italy, Lithuania, Portugal, and Slovenia: the University of Orléans, the University of Siegen, the Hellenic Mediterranean University, the Niccolò Cusano University, the Vilnius Gediminas Technical University, the Polytechnic Institute of Porto, and the University of Maribor. In 2022 institutions from Poland and Spain joined the alliance: the Maria Curie-Skłodowska University and the University of Vigo.
This research book presents a selection of the ATHENA university partners' research activities. It incorporates peer-reviewed original articles, reprints and student contributions. The ATHENA Research Book provides a platform that promotes joint and interdisciplinary research projects of both advanced and early-career researchers