38 research outputs found

    Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

    Get PDF
    We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.

    Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

    Full text link
    We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis. HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs. The proposed HTS is characterized by two novel components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words. HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202

    Towards End-to-End Unified Scene Text Detection and Layout Analysis

    Full text link
    Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves state-of-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-research-datasets/hiertext.Comment: To appear at CVPR 202

    Patient-specific modelling for the assessment of the hemodynamics risk of failure in endovascular aneurysm repair

    Get PDF
    Endovascular aneurysm repair (EVAR), despite its advantages over abdominal aortic aneurysm (AAA) open surgery, still presents risks of failure linked to Endograft (EG) migration. We here explore the link between intravascular blood flow features and Displacement Forces (DFs) acting on the EG. DFs are inversely associated with the amount of helical flow within the EG
    corecore