6 research outputs found
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
Dexterity is often seen as a cornerstone of complex manipulation. Humans are
able to perform a host of skills with their hands, from making food to
operating tools. In this paper, we investigate these challenges, especially in
the case of soft, deformable objects as well as complex, relatively
long-horizon tasks. However, learning such behaviors from scratch can be data
inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous
Fine-Tuning for Hand Policies), that leverages human-driven priors, which are
executed directly in the real world. In order to improve upon these priors,
DEFT involves an efficient online optimization procedure. With the integration
of human-based learning and online fine-tuning, coupled with a soft robotic
hand, DEFT demonstrates success across various tasks, establishing a robust,
data-efficient pathway toward general dexterous manipulation. Please see our
website at https://dexterous-finetuning.github.io for video results.Comment: In CoRL 2023. Website at https://dexterous-finetuning.github.io
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments
Affordance modeling plays an important role in visual understanding. In this
paper, we aim to predict affordances of 3D indoor scenes, specifically what
human poses are afforded by a given indoor environment, such as sitting on a
chair or standing on the floor. In order to predict valid affordances and learn
possible 3D human poses in indoor scenes, we need to understand the semantic
and geometric structure of a scene as well as its potential interactions with a
human. To learn such a model, a large-scale dataset of 3D indoor affordances is
required. In this work, we build a fully automatic 3D pose synthesizer that
fuses semantic knowledge from a large number of 2D poses extracted from TV
shows as well as 3D geometric knowledge from voxel representations of indoor
scenes. With the data created by the synthesizer, we introduce a 3D pose
generative model to predict semantically plausible and physically feasible
human poses within a given scene (provided as a single RGB, RGB-D, or depth
image). We demonstrate that our human affordance prediction method consistently
outperforms existing state-of-the-art methods.Comment: https://sites.google.com/view/3d-affordance-cvpr1