Search CORE

19,694 research outputs found

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Author: Bisk Yonatan
Cheng Yong
Essa Irfan
Hauptmann Alexander G.
Huang Yanping
Jiang Lu
Kumar Vivek
Macherey Wolfgang
Murphy Kevin
Ross David A.
Wang Zhiruo
Yang Ming-Hsuan
Yu Lijun
Publication venue
Publication date: 28/10/2023
Field of study

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks. Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.Comment: NeurIPS 2023 spotligh

arXiv.org e-Print Archive

COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation

Author: B. Gatos
LC Chen
Max Jaderberg
N Otsu
P Andreini
S Bonechi
T-Y Lin
Y Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The absence of large scale datasets with pixel-level supervisions is a significant obstacle for the training of deep convolutional networks for scene text segmentation. For this reason, synthetic data generation is normally employed to enlarge the training dataset. Nonetheless, synthetic data cannot reproduce the complexity and variability of natural images. In this paper, a weakly supervised learning approach is used to reduce the shift between training on real and synthetic data. Pixel-level supervisions for a text detection dataset (i.e. where only bounding-box annotations are available) are generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which provides pixel-level supervisions for the COCO-Text dataset, is created and released. The generated annotations are used to train a deep convolutional neural network for semantic segmentation. Experiments show that the proposed dataset can be used instead of synthetic data, allowing us to use only a fraction of the training samples and significantly improving the performances

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università degli Studi di Siena

Improving Spatial Codification in Semantic Segmentation

Author: Giró-i-Nieto Xavier
Marqués Ferran
McGuinness Kevin
O'Connor Noel E.
Ventura Carles
Vilaplana Verónica
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/05/2015
Field of study

This paper explores novel approaches for improving the spatial codification for the pooling of local descriptors to solve the semantic segmentation problem. We propose to partition the image into three regions for each object to be described: Figure, Border and Ground. This partition aims at minimizing the influence of the image context on the object description and vice versa by introducing an intermediate zone around the object contour. Furthermore, we also propose a richer visual descriptor of the object by applying a Spatial Pyramid over the Figure region. Two novel Spatial Pyramid configurations are explored: Cartesian-based and crown-based Spatial Pyramids. We test these approaches with state-of-the-art techniques and show that they improve the Figure-Ground based pooling in the Pascal VOC 2011 and 2012 semantic segmentation challenges.Comment: Paper accepted at the IEEE International Conference on Image Processing, ICIP 2015. Quebec City, 27-30 September. Project page: https://imatge.upc.edu/web/publications/improving-spatial-codification-semantic-segmentatio

arXiv.org e-Print Archive

Crossref

Irish Universities

DCU Online Research Access Service

Proposal Flow

Author: Cho Minsu
Ham Bumsub
Ponce Jean
Schmid Cordelia
Publication venue
Publication date: 26/06/2016
Field of study

Finding image correspondences remains a challenging problem in the presence of intra-class variations and large changes in scene layout.~Semantic flow methods are designed to handle images depicting different instances of the same object or scene category. We introduce a novel approach to semantic flow, dubbed proposal flow, that establishes reliable correspondences using object proposals. Unlike prevailing semantic flow approaches that operate on pixels or regularly sampled local regions, proposal flow benefits from the characteristics of modern object proposals, that exhibit high repeatability at multiple scales, and can take advantage of both local and geometric consistency constraints among proposals. We also show that proposal flow can effectively be transformed into a conventional dense flow field. We introduce a new dataset that can be used to evaluate both general semantic flow techniques and region-based approaches such as proposal flow. We use this benchmark to compare different matching algorithms, object proposals, and region features within proposal flow, to the state of the art in semantic flow. This comparison, along with experiments on standard datasets, demonstrates that proposal flow significantly outperforms existing semantic flow methods in various settings

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server