969 research outputs found
Evaluation of CNN-based Single-Image Depth Estimation Methods
While an increasing interest in deep models for single-image depth estimation
methods can be observed, established schemes for their evaluation are still
limited. We propose a set of novel quality criteria, allowing for a more
detailed analysis by focusing on specific characteristics of depth maps. In
particular, we address the preservation of edges and planar regions, depth
consistency, and absolute distance accuracy. In order to employ these metrics
to evaluate and compare state-of-the-art single-image depth estimation
approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera
together with a laser scanner to acquire high-resolution images and highly
accurate depth maps. Experimental results show the validity of our proposed
evaluation protocol
VConv-DAE: Deep Volumetric Shape Learning Without Object Labels
With the advent of affordable depth sensors, 3D capture becomes more and more
ubiquitous and already has made its way into commercial products. Yet,
capturing the geometry or complete shapes of everyday objects using scanning
devices (e.g. Kinect) still comes with several challenges that result in noise
or even incomplete shapes. Recent success in deep learning has shown how to
learn complex shape distributions in a data-driven way from large scale 3D CAD
Model collections and to utilize them for 3D processing on volumetric
representations and thereby circumventing problems of topology and
tessellation. Prior work has shown encouraging results on problems ranging from
shape completion to recognition. We provide an analysis of such approaches and
discover that training as well as the resulting representation are strongly and
unnecessarily tied to the notion of object labels. Thus, we propose a full
convolutional volumetric auto encoder that learns volumetric representation
from noisy data by estimating the voxel occupancy grids. The proposed method
outperforms prior work on challenging tasks like denoising and shape
completion. We also show that the obtained deep embedding gives competitive
performance when used for classification and promising results for shape
interpolation
Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
With more and more household objects built on planned obsolescence and
consumed by a fast-growing population, hazardous waste recycling has become a
critical challenge. Given the large variability of household waste, current
recycling platforms mostly rely on human operators to analyze the scene,
typically composed of many object instances piled up in bulk. Helping them by
robotizing the unitary extraction is a key challenge to speed up this tedious
process. Whereas supervised deep learning has proven very efficient for such
object-level scene understanding, e.g., generic object detection and
segmentation in everyday scenes, it however requires large sets of per-pixel
labeled images, that are hardly available for numerous application contexts,
including industrial robotics. We thus propose a step towards a practical
interactive application for generating an object-oriented robotic grasp,
requiring as inputs only one depth map of the scene and one user click on the
next object to extract. More precisely, we address in this paper the middle
issue of object seg-mentation in top views of piles of bulk objects given a
pixel location, namely seed, provided interactively by a human operator. We
propose a twofold framework for generating edge-driven instance segments.
First, we repurpose a state-of-the-art fully convolutional object contour
detector for seed-based instance segmentation by introducing the notion of
edge-mask duality with a novel patch-free and contour-oriented loss function.
Second, we train one model using only synthetic scenes, instead of manually
labeled training data. Our experimental results show that considering edge-mask
duality for training an encoder-decoder network, as we suggest, outperforms a
state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly
Robotics, 10th International Workshop, Springer Proceedings in Advanced
Robotics, vol 7. The final authenticated version is available online at:
https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in
Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly
Robotics, 10th International Workshop,
Ariel - Volume 2 Number 6
Editors
Richard J. Bonanno
Robin A. Edwards
Associate Editors
Steven Ager
Stephen Flynn
Shep Dickman
Tom Williams
Lay-out Editor
Eugenia Miller
Contributing Editors
Michael J. Blecker
W. Cherry Light
James J. Nocon
Lynne Porter
Editors Emeritus
Delvyn C. Case, Jr.
Paul M. Fernhof
Some open questions in "wave chaos"
The subject area referred to as "wave chaos", "quantum chaos" or "quantum
chaology" has been investigated mostly by the theoretical physics community in
the last 30 years. The questions it raises have more recently also attracted
the attention of mathematicians and mathematical physicists, due to connections
with number theory, graph theory, Riemannian, hyperbolic or complex geometry,
classical dynamical systems, probability etc. After giving a rough account on
"what is quantum chaos?", I intend to list some pending questions, some of them
having been raised a long time ago, some others more recent
Prozone Masks Elevated Sars-Cov-2 Antibody Level Measurements
We report a prozone effect in measurement of SARS-CoV-2 spike protein antibody levels from an antibody surveillance program. Briefly, the prozone effect occurs in immunoassays when excessively high antibody concentration disrupts the immune complex formation, resulting in a spuriously low reported result. Following participant inquiries, we observed anomalously low measurement of SARS-CoV-2 spike protein antibody levels using the Roche Elecsys® Anti-SARS-CoV-2 S immunoassay from participants in the Texas Coronavirus Antibody Research survey (Texas CARES), an ongoing prospective, longitudinal antibody surveillance program. In July, 2022, samples were collected from ten participants with anomalously low results for serial dilution studies, and a prozone effect was confirmed. From October, 2022 to March, 2023, serial dilution of samples detected 74 additional cases of prozone out of 1,720 participants\u27 samples. Prozone effect may affect clinical management of at-risk populations repeatedly exposed to SARS-CoV-2 spike protein through multiple immunizations or serial infections, making awareness and mitigation of this issue paramount
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Can ground truth label propagation from video help semantic segmentation?
For state-of-the-art semantic segmentation task, training convolutional
neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling,
which is expensive and involves extensive human effort. In this work, we study
the possibility of using auxiliary ground truth, so-called \textit{pseudo
ground truth} (PGT) to improve the performance. The PGT is obtained by
propagating the labels of a GT frame to its subsequent frames in the video
using a simple CRF-based, cue integration framework. Our main contribution is
to demonstrate the use of noisy PGT along with GT to improve the performance of
a CNN. We perform a systematic analysis to find the right kind of PGT that
needs to be added along with the GT for training a CNN. In this regard, we
explore three aspects of PGT which influence the learning of a CNN: i) the PGT
labeling has to be of good quality; ii) the PGT images have to be different
compared to the GT images; iii) the PGT has to be trusted differently than GT.
We conclude that PGT which is diverse from GT images and has good quality of
labeling can indeed help improve the performance of a CNN. Also, when PGT is
multiple folds larger than GT, weighing down the trust on PGT helps in
improving the accuracy. Finally, We show that using PGT along with GT, the
performance of Fully Convolutional Network (FCN) on Camvid data is increased by
on IoU accuracy. We believe such an approach can be used to train CNNs
for semantic video segmentation where sequentially labeled image frames are
needed. To this end, we provide recommendations for using PGT strategically for
semantic segmentation and hence bypass the need for extensive human efforts in
labeling.Comment: To appear at ECCV 2016 Workshop on Video Segmentatio
Reputation Agent: Prompting Fair Reviews in Gig Markets
Our study presents a new tool, Reputation Agent, to promote fairer reviews
from requesters (employers or customers) on gig markets. Unfair reviews,
created when requesters consider factors outside of a worker's control, are
known to plague gig workers and can result in lost job opportunities and even
termination from the marketplace. Our tool leverages machine learning to
implement an intelligent interface that: (1) uses deep learning to
automatically detect when an individual has included unfair factors into her
review (factors outside the worker's control per the policies of the market);
and (2) prompts the individual to reconsider her review if she has incorporated
unfair factors. To study the effectiveness of Reputation Agent, we conducted a
controlled experiment over different gig markets. Our experiment illustrates
that across markets, Reputation Agent, in contrast with traditional approaches,
motivates requesters to review gig workers' performance more fairly. We discuss
how tools that bring more transparency to employers about the policies of a gig
market can help build empathy thus resulting in reasoned discussions around
potential injustices towards workers generated by these interfaces. Our vision
is that with tools that promote truth and transparency we can bring fairer
treatment to gig workers.Comment: 12 pages, 5 figures, The Web Conference 2020, ACM WWW 202
- …