628 research outputs found
Evaluation of CNN-based Single-Image Depth Estimation Methods
While an increasing interest in deep models for single-image depth estimation
methods can be observed, established schemes for their evaluation are still
limited. We propose a set of novel quality criteria, allowing for a more
detailed analysis by focusing on specific characteristics of depth maps. In
particular, we address the preservation of edges and planar regions, depth
consistency, and absolute distance accuracy. In order to employ these metrics
to evaluate and compare state-of-the-art single-image depth estimation
approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera
together with a laser scanner to acquire high-resolution images and highly
accurate depth maps. Experimental results show the validity of our proposed
evaluation protocol
Cosmological Density and Power Spectrum from Peculiar Velocities: Nonlinear Corrections and PCA
We allow for nonlinear effects in the likelihood analysis of galaxy peculiar
velocities, and obtain ~35%-lower values for the cosmological density parameter
Om and the amplitude of mass-density fluctuations. The power spectrum in the
linear regime is assumed to be a flat LCDM model (h=0.65, n=1, COBE) with only
Om as a free parameter. Since the likelihood is driven by the nonlinear regime,
we "break" the power spectrum at k_b=0.2 h/Mpc and fit a power law at k>k_b.
This allows for independent matching of the nonlinear behavior and an unbiased
fit in the linear regime. The analysis assumes Gaussian fluctuations and
errors, and a linear relation between velocity and density. Tests using proper
mock catalogs demonstrate a reduced bias and a better fit. We find for the
Mark3 and SFI data Om_m=0.32+-0.06 and 0.37+-0.09 respectively, with
sigma_8*Om^0.6 = 0.49+-0.06 and 0.63+-0.08, in agreement with constraints from
other data. The quoted 90% errors include cosmic variance. The improvement in
likelihood due to the nonlinear correction is very significant for Mark3 and
moderately so for SFI. When allowing deviations from LCDM, we find an
indication for a wiggle in the power spectrum: an excess near k=0.05 and a
deficiency at k=0.1 (cold flow). This may be related to the wiggle seen in the
power spectrum from redshift surveys and the second peak in the CMB anisotropy.
A chi^2 test applied to modes of a Principal Component Analysis (PCA) shows
that the nonlinear procedure improves the goodness of fit and reduces a spatial
gradient of concern in the linear analysis. The PCA allows addressing spatial
features of the data and fine-tuning the theoretical and error models. It shows
that the models used are appropriate for the cosmological parameter estimation
performed. We address the potential for optimal data compression using PCA.Comment: 18 pages, LaTex, uses emulateapj.sty, ApJ in press (August 10, 2001),
improvements to text and figures, updated reference
iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects
We address the task of 6D pose estimation of known rigid objects from single
input images in scenarios where the objects are partly occluded. Recent
RGB-D-based methods are robust to moderate degrees of occlusion. For RGB
inputs, no previous method works well for partly occluded objects. Our main
contribution is to present the first deep learning-based system that estimates
accurate poses for partly occluded objects from RGB-D and RGB input. We achieve
this with a new instance-aware pipeline that decomposes 6D object pose
estimation into a sequence of simpler steps, where each step removes specific
aspects of the problem. The first step localizes all known objects in the image
using an instance segmentation network, and hence eliminates surrounding
clutter and occluders. The second step densely maps pixels to 3D object surface
positions, so called object coordinates, using an encoder-decoder network, and
hence eliminates object appearance. The third, and final, step predicts the 6D
pose using geometric optimization. We demonstrate that we significantly
outperform the state-of-the-art for pose estimation of partly occluded objects
for both RGB and RGB-D input
Estimating Depth from RGB and Sparse Sensing
We present a deep model that can accurately produce dense depth maps given an
RGB image with known depth at a very sparse set of pixels. The model works
simultaneously for both indoor/outdoor scenes and produces state-of-the-art
dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI
datasets. We surpass the state-of-the-art for monocular depth estimation even
with depth values for only 1 out of every ~10000 image pixels, and we
outperform other sparse-to-dense depth methods at all sparsity levels. With
depth values for 1/256 of the image pixels, we achieve a mean absolute error of
less than 1% of actual depth on indoor scenes, comparable to the performance of
consumer-grade depth sensor hardware. Our experiments demonstrate that it would
indeed be possible to efficiently transform sparse depth measurements obtained
using e.g. lower-power depth sensors or SLAM systems into high-quality dense
depth maps.Comment: European Conference on Computer Vision (ECCV) 2018. Updated to
camera-ready version with additional experiment
The devil is in the decoder
Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance
Accurate and linear time pose estimation from points and lines
The final publication is available at link.springer.comThe Perspective-n-Point (PnP) problem seeks to estimate the pose of a calibrated camera from n 3Dto-2D point correspondences. There are situations, though, where PnP solutions are prone to fail because feature point correspondences cannot be reliably estimated (e.g. scenes with repetitive patterns or with low texture). In such
scenarios, one can still exploit alternative geometric entities, such as lines, yielding the so-called Perspective-n-Line (PnL) algorithms. Unfortunately, existing PnL solutions are not as accurate and efficient as their point-based
counterparts. In this paper we propose a novel approach to introduce 3D-to-2D line correspondences into a PnP formulation, allowing to simultaneously process points and lines. For this purpose we introduce an algebraic line error
that can be formulated as linear constraints on the line endpoints, even when these are not directly observable. These constraints can then be naturally integrated within the linear formulations of two state-of-the-art point-based algorithms,
the OPnP and the EPnP, allowing them to indistinctly handle points, lines, or a combination of them. Exhaustive experiments show that the proposed formulation brings remarkable boost in performance compared to only point or
only line based solutions, with a negligible computational overhead compared to the original OPnP and EPnP.Peer ReviewedPostprint (author's final draft
Some open questions in "wave chaos"
The subject area referred to as "wave chaos", "quantum chaos" or "quantum
chaology" has been investigated mostly by the theoretical physics community in
the last 30 years. The questions it raises have more recently also attracted
the attention of mathematicians and mathematical physicists, due to connections
with number theory, graph theory, Riemannian, hyperbolic or complex geometry,
classical dynamical systems, probability etc. After giving a rough account on
"what is quantum chaos?", I intend to list some pending questions, some of them
having been raised a long time ago, some others more recent
Prozone Masks Elevated Sars-Cov-2 Antibody Level Measurements
We report a prozone effect in measurement of SARS-CoV-2 spike protein antibody levels from an antibody surveillance program. Briefly, the prozone effect occurs in immunoassays when excessively high antibody concentration disrupts the immune complex formation, resulting in a spuriously low reported result. Following participant inquiries, we observed anomalously low measurement of SARS-CoV-2 spike protein antibody levels using the Roche Elecsys® Anti-SARS-CoV-2 S immunoassay from participants in the Texas Coronavirus Antibody Research survey (Texas CARES), an ongoing prospective, longitudinal antibody surveillance program. In July, 2022, samples were collected from ten participants with anomalously low results for serial dilution studies, and a prozone effect was confirmed. From October, 2022 to March, 2023, serial dilution of samples detected 74 additional cases of prozone out of 1,720 participants\u27 samples. Prozone effect may affect clinical management of at-risk populations repeatedly exposed to SARS-CoV-2 spike protein through multiple immunizations or serial infections, making awareness and mitigation of this issue paramount
Can ground truth label propagation from video help semantic segmentation?
For state-of-the-art semantic segmentation task, training convolutional
neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling,
which is expensive and involves extensive human effort. In this work, we study
the possibility of using auxiliary ground truth, so-called \textit{pseudo
ground truth} (PGT) to improve the performance. The PGT is obtained by
propagating the labels of a GT frame to its subsequent frames in the video
using a simple CRF-based, cue integration framework. Our main contribution is
to demonstrate the use of noisy PGT along with GT to improve the performance of
a CNN. We perform a systematic analysis to find the right kind of PGT that
needs to be added along with the GT for training a CNN. In this regard, we
explore three aspects of PGT which influence the learning of a CNN: i) the PGT
labeling has to be of good quality; ii) the PGT images have to be different
compared to the GT images; iii) the PGT has to be trusted differently than GT.
We conclude that PGT which is diverse from GT images and has good quality of
labeling can indeed help improve the performance of a CNN. Also, when PGT is
multiple folds larger than GT, weighing down the trust on PGT helps in
improving the accuracy. Finally, We show that using PGT along with GT, the
performance of Fully Convolutional Network (FCN) on Camvid data is increased by
on IoU accuracy. We believe such an approach can be used to train CNNs
for semantic video segmentation where sequentially labeled image frames are
needed. To this end, we provide recommendations for using PGT strategically for
semantic segmentation and hence bypass the need for extensive human efforts in
labeling.Comment: To appear at ECCV 2016 Workshop on Video Segmentatio
NODIS: Neural Ordinary Differential Scene Understanding
Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark
- …