2 research outputs found
Image captioning for Brazilian Portuguese using GRIT model
This work presents the early development of a model of image captioning for
the Brazilian Portuguese language. We used the GRIT (Grid - and Region-based
Image captioning Transformer) model to accomplish this work. GRIT is a
Transformer-only neural architecture that effectively utilizes two visual
features to generate better captions. The GRIT method emerged as a proposal to
be a more efficient way to generate image captioning. In this work, we adapt
the GRIT model to be trained in a Brazilian Portuguese dataset to have an image
captioning method for the Brazilian Portuguese Language.Comment: arXiv admin note: text overlap with arXiv:2207.09666 by other author
From Pampas to Pixels: Fine-Tuning Diffusion Models for Ga\'ucho Heritage
Generative AI has become pervasive in society, witnessing significant
advancements in various domains. Particularly in the realm of Text-to-Image
(TTI) models, Latent Diffusion Models (LDMs), showcase remarkable capabilities
in generating visual content based on textual prompts. This paper addresses the
potential of LDMs in representing local cultural concepts, historical figures,
and endangered species. In this study, we use the cultural heritage of Rio
Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to
contribute to the broader understanding of how generative models can help to
capture and preserve the cultural and historical identity of regions. The paper
outlines the methodology, including subject selection, dataset creation, and
the fine-tuning process. The results showcase the images generated, alongside
the challenges and feasibility of each concept. In conclusion, this work shows
the power of these models to represent and preserve unique aspects of diverse
regions and communities