10 research outputs found
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots
Deep networks have brought significant advances in robot perception, enabling
to improve the capabilities of robots in several visual tasks, ranging from
object detection and recognition to pose estimation, semantic scene
segmentation and many others. Still, most approaches typically address visual
tasks in isolation, resulting in overspecialized models which achieve strong
performances in specific applications but work poorly in other (often related)
tasks. This is clearly sub-optimal for a robot which is often required to
perform simultaneously multiple visual recognition tasks in order to properly
act and interact with the environment. This problem is exacerbated by the
limited computational and memory resources typically available onboard to a
robotic platform. The problem of learning flexible models which can handle
multiple tasks in a lightweight manner has recently gained attention in the
computer vision community and benchmarks supporting this research have been
proposed. In this work we study this problem in the robot vision context,
proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art
algorithms in this novel challenging scenario. We also define a new evaluation
protocol, better suited to the robot vision setting. Results shed light on the
strengths and weaknesses of existing approaches and on open issues, suggesting
directions for future research.Comment: This work has been submitted to IROS/RAL 201
When Regression Meets Manifold Learning for Object Recognition and Pose Estimation
In this work, we propose a method for object recognition and pose estimation
from depth images using convolutional neural networks. Previous methods
addressing this problem rely on manifold learning to learn low dimensional
viewpoint descriptors and employ them in a nearest neighbor search on an
estimated descriptor space. In comparison we create an efficient multi-task
learning framework combining manifold descriptor learning and pose regression.
By combining the strengths of manifold learning using triplet loss and pose
regression, we could either estimate the pose directly reducing the complexity
compared to NN search, or use learned descriptor for the NN descriptor
matching. By in depth experimental evaluation of the novel loss function we
observed that the view descriptors learned by the network are much more
discriminative resulting in almost 30% increase regarding relative pose
accuracy compared to related works. On the other hand, regarding directly
regressed poses we obtained important improvement compared to simple pose
regression. By leveraging the advantages of both manifold learning and
regression tasks, we are able to improve the current state-of-the-art for
object recognition and pose retrieval that we demonstrate through in depth
experimental evaluation
Content-Based Medical Image Retrieval with Opponent Class Adaptive Margin Loss
Broadspread use of medical imaging devices with digital storage has paved the
way for curation of substantial data repositories. Fast access to image samples
with similar appearance to suspected cases can help establish a consulting
system for healthcare professionals, and improve diagnostic procedures while
minimizing processing delays. However, manual querying of large data
repositories is labor intensive. Content-based image retrieval (CBIR) offers an
automated solution based on dense embedding vectors that represent image
features to allow quantitative similarity assessments. Triplet learning has
emerged as a powerful approach to recover embeddings in CBIR, albeit
traditional loss functions ignore the dynamic relationship between opponent
image classes. Here, we introduce a triplet-learning method for automated
querying of medical image repositories based on a novel Opponent Class Adaptive
Margin (OCAM) loss. OCAM uses a variable margin value that is updated
continually during the course of training to maintain optimally discriminative
representations. CBIR performance of OCAM is compared against state-of-the-art
loss functions for representational learning on three public databases
(gastrointestinal disease, skin lesion, lung disease). Comprehensive
experiments in each application domain demonstrate the superior performance of
OCAM against baselines.Comment: 10 pages, 6 figure
Rekonstruktion und skalierbare Detektion und Verfolgung von 3D Objekten
The task of detecting objects in images is essential for autonomous systems to categorize, comprehend and eventually navigate or manipulate its environment. Since many applications demand not only detection of objects but also the estimation of their exact poses, 3D CAD models can prove helpful since they provide means for feature extraction and hypothesis refinement. This work, therefore, explores two paths: firstly, we will look into methods to create richly-textured and geometrically accurate models of real-life objects. Using these reconstructions as a basis, we will investigate on how to improve in the domain of 3D object detection and pose estimation, focusing especially on scalability, i.e. the problem of dealing with multiple objects simultaneously.Objekterkennung in Bildern ist für ein autonomes System von entscheidender Bedeutung, um seine Umgebung zu kategorisieren, zu erfassen und schließlich zu navigieren oder zu manipulieren. Da viele Anwendungen nicht nur die Erkennung von Objekten, sondern auch die Schätzung ihrer exakten Positionen erfordern, können sich 3D-CAD-Modelle als hilfreich erweisen, da sie Mittel zur Merkmalsextraktion und Verfeinerung von Hypothesen bereitstellen. In dieser Arbeit werden daher zwei Wege untersucht: Erstens werden wir Methoden untersuchen, um strukturreiche und geometrisch genaue Modelle realer Objekte zu erstellen. Auf der Grundlage dieser Konstruktionen werden wir untersuchen, wie sich der Bereich der 3D-Objekterkennung und der Posenschätzung verbessern lässt, wobei insbesondere die Skalierbarkeit im Vordergrund steht, d.h. das Problem der gleichzeitigen Bearbeitung mehrerer Objekte