19,694 research outputs found
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling
frozen LLMs to perform both understanding and generation tasks involving
non-linguistic modalities such as images or videos. SPAE converts between raw
pixels and interpretable lexical tokens (or words) extracted from the LLM's
vocabulary. The resulting tokens capture both the semantic meaning and the
fine-grained details needed for visual reconstruction, effectively translating
the visual content into a language comprehensible to the LLM, and empowering it
to perform a wide array of multimodal tasks. Our approach is validated through
in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set
of image understanding and generation tasks. Our method marks the first
successful attempt to enable a frozen LLM to generate image content while
surpassing state-of-the-art performance in image understanding tasks, under the
same setting, by over 25%.Comment: NeurIPS 2023 spotligh
COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation
The absence of large scale datasets with pixel-level supervisions is a
significant obstacle for the training of deep convolutional networks for scene
text segmentation. For this reason, synthetic data generation is normally
employed to enlarge the training dataset. Nonetheless, synthetic data cannot
reproduce the complexity and variability of natural images. In this paper, a
weakly supervised learning approach is used to reduce the shift between
training on real and synthetic data. Pixel-level supervisions for a text
detection dataset (i.e. where only bounding-box annotations are available) are
generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which
provides pixel-level supervisions for the COCO-Text dataset, is created and
released. The generated annotations are used to train a deep convolutional
neural network for semantic segmentation. Experiments show that the proposed
dataset can be used instead of synthetic data, allowing us to use only a
fraction of the training samples and significantly improving the performances
Improving Spatial Codification in Semantic Segmentation
This paper explores novel approaches for improving the spatial codification
for the pooling of local descriptors to solve the semantic segmentation
problem. We propose to partition the image into three regions for each object
to be described: Figure, Border and Ground. This partition aims at minimizing
the influence of the image context on the object description and vice versa by
introducing an intermediate zone around the object contour. Furthermore, we
also propose a richer visual descriptor of the object by applying a Spatial
Pyramid over the Figure region. Two novel Spatial Pyramid configurations are
explored: Cartesian-based and crown-based Spatial Pyramids. We test these
approaches with state-of-the-art techniques and show that they improve the
Figure-Ground based pooling in the Pascal VOC 2011 and 2012 semantic
segmentation challenges.Comment: Paper accepted at the IEEE International Conference on Image
Processing, ICIP 2015. Quebec City, 27-30 September. Project page:
https://imatge.upc.edu/web/publications/improving-spatial-codification-semantic-segmentatio
Proposal Flow
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout.~Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that proposal flow can effectively be
transformed into a conventional dense flow field. We introduce a new dataset
that can be used to evaluate both general semantic flow techniques and
region-based approaches such as proposal flow. We use this benchmark to compare
different matching algorithms, object proposals, and region features within
proposal flow, to the state of the art in semantic flow. This comparison, along
with experiments on standard datasets, demonstrates that proposal flow
significantly outperforms existing semantic flow methods in various settings
- …