4 research outputs found

    IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting

    Full text link
    Although action recognition for procedural tasks has received notable attention, it has a fundamental flaw in that no measure of success for actions is provided. This limits the applicability of such systems especially within the industrial domain, since the outcome of procedural actions is often significantly more important than the mere execution. To address this limitation, we define the novel task of procedure step recognition (PSR), focusing on recognizing the correct completion and order of procedural steps. Alongside the new task, we also present the multi-modal IndustReal dataset. Unlike currently available datasets, IndustReal contains procedural errors (such as omissions) as well as execution errors. A significant part of these errors are exclusively present in the validation and test sets, making IndustReal suitable to evaluate robustness of algorithms to new, unseen mistakes. Additionally, to encourage reproducibility and allow for scalable approaches trained on synthetic data, the 3D models of all parts are publicly available. Annotations and benchmark performance are provided for action recognition and assembly state detection, as well as the new PSR task. IndustReal, along with the code and model weights, is available at: https://github.com/TimSchoonbeek/IndustReal .Comment: Accepted for WACV 2024. 15 pages, 9 figures, including supplementary material

    Learning to Predict Collision Risk from Simulated Video Data

    Get PDF
    We propose an image-based collision risk prediction model and a training strategy that allows training on simulated video data and successfully generalizes to real data. By doing so, we solve the data scarcity problem of collecting and labeling real (near) collisions, which are exceptionally rare events. Domain generalization from simulated to real data is taken into account by design by decoupling the learning strategy, and using task-specific, domain-resilient intermediate representations. Specifically, we use optical flow and vehicle bounding boxes, since they are instinctively related to the task of collision risk prediction and because their simulated-to-real domain gap is significantly lower than that of camera video data, i.e., they are more domain resilient. To demonstrate our approach, we present RiskNet, a novel neural network for image-based collision risk prediction, which classifies individual frames of a video sequence of a front-facing camera as safe or unsafe. Additionally, we present two novel datasets: the simulated Prescan dataset (which we intend to make publicly available) for training and the YouTube Driving Incidents Database (YDID) for real-world testing. The performance of RiskNet, trained solely on simulated data and tested on the real-world YDID, is comparable to that of a human driver, both in accuracy (91.8% vs. 93.6%) and F1-score (0.92 vs 0.94)

    Beyond Action Recognition: Extracting Meaningful Information from Procedure Recordings

    No full text
    Understanding procedural actions is important, as it can be used to automatically analyze the execution of a procedure and provide assistance to users by warning for potential mistakes or forgotten steps. However, current approaches require a rigid, step-by-step execution order, laborious and impractical datasets. Furthermore, they are unreliable to variations in viewpoint, or measure the performance of actions rather than the actual completion of actions. To address these limitations and stimulate research in this field, this work proposes the novel task of procedure state recognition (PSR) together with a set of evaluation metrics

    Augmented Reality for Automatically Generating Robust Manufacturing and Maintenance Logs

    Get PDF
    Logs describing the execution of procedural steps during manufacturing and maintenance tasks are important for quality control and configuration management. Such logs are currently hand-written or typed during a procedure, which requires engineers to frequently step away from their work and results in difficulties for searching and optimizing logs. In this paper, we propose to automatically generate standardized, searchable logs, by visually perceiving and monitoring the progress of the procedure in real-time, and comparing this to the expected procedure. Unlike related work, we propose an approach which does not restrict the engineers to rigid, sequential sequences and instead allows them to execute procedures in a variety of different sequences where possible. The proposed framework is experimentally validated on the task of (dis)assembling a Duplo block model and operates properly when occlusions are absent
    corecore