152 research outputs found
Integration of Experts' and Beginners' Machine Operation Experiences to Obtain a Detailed Task Model
We propose a novel framework for integrating beginners' machine operational experiences with those of experts' to obtain a detailed task model. Beginners can provide valuable information for operation guidance and task design; for example, from the operations that are easy or difficult for them, the mistakes they make, and the strategy they tend to choose. However, beginners' experiences often vary widely and are difficult to integrate directly. Thus, we consider an operational experience as a sequence of hand-machine interactions at hotspots. Then, a few experts' experiences and a sufficient number of beginners' experiences are unified using two aggregation steps that align and integrate sequences of interactions. We applied our method to more than 40 experiences of a sewing task. The results demonstrate good potential for modeling and obtaining important properties of the task
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video
We address the challenging task of anticipating human-object interaction in
first person videos. Most existing methods ignore how the camera wearer
interacts with the objects, or simply consider body motion as a separate
modality. In contrast, we observe that the international hand movement reveals
critical information about the future activity. Motivated by this, we adopt
intentional hand movement as a future representation and propose a novel deep
network that jointly models and predicts the egocentric hand motion,
interaction hotspots and future action. Specifically, we consider the future
hand motion as the motor attention, and model this attention using latent
variables in our deep model. The predicted motor attention is further used to
characterise the discriminative spatial-temporal visual features for predicting
actions and interaction hotspots. We present extensive experiments
demonstrating the benefit of the proposed joint model. Importantly, our model
produces new state-of-the-art results for action anticipation on both EGTEA
Gaze+ and the EPIC-Kitchens datasets. Our project page is available at
https://aptx4869lm.github.io/ForecastingHOI
COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos
To produce safe human motions, assistive wearable exoskeletons must be
equipped with a perception system that enables anticipating potential
collisions from egocentric observations. However, previous approaches to
exoskeleton perception greatly simplify the problem to specific types of
environments, limiting their scalability. In this paper, we propose the
challenging and novel problem of predicting human-scene collisions for diverse
environments from multi-view egocentric RGB videos captured from an
exoskeleton. By classifying which body joints will collide with the environment
and predicting a collision region heatmap that localizes potential collisions
in the environment, we aim to develop an exoskeleton perception system that
generalizes to complex real-world scenes and provides actionable outputs for
downstream control. We propose COPILOT, a video transformer-based model that
performs both collision prediction and localization simultaneously, leveraging
multi-view video inputs via a proposed joint space-time-viewpoint attention
operation. To train and evaluate the model, we build a synthetic data
generation framework to simulate virtual humans moving in photo-realistic 3D
environments. This framework is then used to establish a dataset consisting of
8.6M egocentric RGBD frames to enable future work on the problem. Extensive
experiments suggest that our model achieves promising performance and
generalizes to unseen scenes as well as real world. We apply COPILOT to a
downstream collision avoidance task, and successfully reduce collision cases by
29% on unseen scenes using a simple closed-loop control algorithm.Comment: 8 pages, 6 figure
- …