Search CORE

10 research outputs found

Barycenters of Natural Images -- Constrained Wasserstein Barycenters for Image Morphing

Author: Aberdam Aviad
Simon Dror
Publication venue
Publication date: 24/12/2019
Field of study

Image interpolation, or image morphing, refers to a visual transition between two (or more) input images. For such a transition to look visually appealing, its desirable properties are (i) to be smooth; (ii) to apply the minimal required change in the image; and (iii) to seem "real", avoiding unnatural artifacts in each image in the transition. To obtain a smooth and straightforward transition, one may adopt the well-known Wasserstein Barycenter Problem (WBP). While this approach guarantees minimal changes under the Wasserstein metric, the resulting images might seem unnatural. In this work, we propose a novel approach for image morphing that possesses all three desired properties. To this end, we define a constrained variant of the WBP that enforces the intermediate images to satisfy an image prior. We describe an algorithm that solves this problem and demonstrate it using the sparse prior and generative adversarial networks

arXiv.org e-Print Archive

Crossref

Sequence-to-Sequence Contrastive Learning for Text Recognition

Author: Aberdam Aviad
Anschel Oron
Litman Ron
Manmatha R.
Mazor Shai
Perona Pietro
Slossberg Ron
Tsiper Shahar
Publication venue
Publication date: 20/12/2020
Field of study

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks

arXiv.org e-Print Archive

Caltech Authors

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Author: Aberdam Aviad
Bensaïd David
Ganz Roy
Golts Alona
Litman Ron
Mazor Shai
Nuriel Oren
Tichauer Royee
Publication venue
Publication date: 23/07/2023
Field of study

Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.Comment: Accepted for publication by ICCV 202

arXiv.org e-Print Archive