38 research outputs found
Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
We propose Hierarchical Text Spotter (HTS), a novel method for the joint task
of word-level text spotting and geometric layout analysis. HTS can recognize
text in an image and identify its 4-level hierarchical structure: characters,
words, lines, and paragraphs. The proposed HTS is characterized by two novel
components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve
polygons of text lines and an affinity matrix for paragraph grouping between
detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits
lines into characters and further merges them back into words. HTS achieves
state-of-the-art results on multiple word-level text spotting benchmark
datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Scene text detection and document layout analysis have long been treated as
two separate tasks in different image domains. In this paper, we bring them
together and introduce the task of unified scene text detection and layout
analysis. The first hierarchical scene text dataset is introduced to enable
this novel research task. We also propose a novel method that is able to
simultaneously detect scene text and form text clusters in a unified way.
Comprehensive experiments show that our unified model achieves better
performance than multiple well-designed baseline methods. Additionally, this
model achieves state-of-the-art results on multiple scene text detection
datasets without the need of complex post-processing. Dataset and code:
https://github.com/google-research-datasets/hiertext.Comment: To appear at CVPR 202
Patient-specific modelling for the assessment of the hemodynamics risk of failure in endovascular aneurysm repair
Endovascular aneurysm repair (EVAR), despite its advantages over abdominal aortic aneurysm (AAA) open surgery, still presents risks of failure linked to Endograft (EG) migration. We here explore the link between intravascular blood flow features and Displacement Forces (DFs) acting on the EG. DFs are inversely associated with the amount of helical flow within the EG