152 research outputs found

    Hotspots detection for machine operation in egocentric vision

    Get PDF

    Integration of Experts' and Beginners' Machine Operation Experiences to Obtain a Detailed Task Model

    Get PDF
    We propose a novel framework for integrating beginners' machine operational experiences with those of experts' to obtain a detailed task model. Beginners can provide valuable information for operation guidance and task design; for example, from the operations that are easy or difficult for them, the mistakes they make, and the strategy they tend to choose. However, beginners' experiences often vary widely and are difficult to integrate directly. Thus, we consider an operational experience as a sequence of hand-machine interactions at hotspots. Then, a few experts' experiences and a sufficient number of beginners' experiences are unified using two aggregation steps that align and integrate sequences of interactions. We applied our method to more than 40 experiences of a sewing task. The results demonstrate good potential for modeling and obtaining important properties of the task

    Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

    Full text link
    We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. Our project page is available at https://aptx4869lm.github.io/ForecastingHOI

    Egocentric Perception of Hands and Its Applications

    Get PDF

    COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

    Full text link
    To produce safe human motions, assistive wearable exoskeletons must be equipped with a perception system that enables anticipating potential collisions from egocentric observations. However, previous approaches to exoskeleton perception greatly simplify the problem to specific types of environments, limiting their scalability. In this paper, we propose the challenging and novel problem of predicting human-scene collisions for diverse environments from multi-view egocentric RGB videos captured from an exoskeleton. By classifying which body joints will collide with the environment and predicting a collision region heatmap that localizes potential collisions in the environment, we aim to develop an exoskeleton perception system that generalizes to complex real-world scenes and provides actionable outputs for downstream control. We propose COPILOT, a video transformer-based model that performs both collision prediction and localization simultaneously, leveraging multi-view video inputs via a proposed joint space-time-viewpoint attention operation. To train and evaluate the model, we build a synthetic data generation framework to simulate virtual humans moving in photo-realistic 3D environments. This framework is then used to establish a dataset consisting of 8.6M egocentric RGBD frames to enable future work on the problem. Extensive experiments suggest that our model achieves promising performance and generalizes to unseen scenes as well as real world. We apply COPILOT to a downstream collision avoidance task, and successfully reduce collision cases by 29% on unseen scenes using a simple closed-loop control algorithm.Comment: 8 pages, 6 figure
    corecore