433 research outputs found
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image. We evaluate three pretrained V&L models on these tasks: ViLBERT, ViLBERT 12-in-1 and LXMERT, in zero-shot and finetuned settings. Our results show that models solve task (1) very well, as expected, since all models are pretrained on task (1). However, none of the pretrained V&L models is able to adequately solve task (2), our counting probe, and they cannot generalise to out-ofdistribution quantities. We propose a number of explanations for these findings: LXMERT (and to some extent ViLBERT 12-in-1) show some evidence of catastrophic forgetting on task (1). Concerning our results on the counting probe, we find evidence that all models are impacted by dataset bias, and also fail to individuate entities in the visual input. While a selling point of pretrained V&L models is their ability to solve complex tasks, our findings suggest that understanding their reasoning and grounding capabilities requires more targeted investigations on specific phenomen
VisualSem: a high-quality knowledge graph for vision and language
An exciting frontier in natural language understanding (NLU) and generation
(NLG) calls for (vision-and-) language models that can efficiently access
external structured knowledge repositories. However, many existing knowledge
bases only cover limited domains, or suffer from noisy data, and most of all
are typically hard to integrate into neural language pipelines. To fill this
gap, we release VisualSem: a high-quality knowledge graph (KG) which includes
nodes with multilingual glosses, multiple illustrative images, and visually
relevant relations. We also release a neural multi-modal retrieval model that
can use images or sentences as inputs and retrieves entities in the KG. This
multi-modal retrieval model can be integrated into any (neural network) model
pipeline. We encourage the research community to use VisualSem for data
augmentation and/or as a source of grounding, among other possible uses.
VisualSem as well as the multi-modal retrieval models are publicly available
and can be downloaded in this URL: https://github.com/iacercalixto/visualsemComment: Accepted for publication at the 1st Multilingual Representation
Learning workshop (MRL 2021) co-located with EMNLP 2021. 15 pages, 8 figures,
6 table
Documentação fotográfica do desempenho silvicultural de espécies arbóreas na recuperação de áreas degradadas pela deposição de rejeitos finos da mineração de cobre.
O presente trabalho documenta fotograficamente o desempenho silvicultural de espécies arbóreas plantadas numa área degradada pela deposição de rejeitos finos da Mineração Caraíba situada no sertão baiano, em Jaguarari,BA. A primeira fase de seleção das espécies mais tolerantes às áreas de rejeito aconteceu no período de 1991 até 1995, sendo as espécies arbóreas/arbustivas testadas: algaroba (Prosopis juliflora), angico (Anadenanthera columbina),aroeira(Myracrod), entre outras
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too
Intermediate-task training---fine-tuning a pretrained model on an
intermediate task before fine-tuning again on the target task---often improves
model performance substantially on language understanding tasks in monolingual
English settings. We investigate whether English intermediate-task training is
still helpful on non-English target tasks. Using nine intermediate
language-understanding tasks, we evaluate intermediate-task transfer in a
zero-shot cross-lingual setting on the XTREME benchmark. We see large
improvements from intermediate training on the BUCC and Tatoeba sentence
retrieval tasks and moderate improvements on question-answering target tasks.
MNLI, SQuAD and HellaSwag achieve the best overall results as intermediate
tasks, while multi-task intermediate offers small additional improvements.
Using our best intermediate-task models for each target task, we obtain a 5.4
point improvement over XLM-R Large on the XTREME benchmark, setting the state
of the art as of June 2020. We also investigate continuing multilingual MLM
during intermediate-task training and using machine-translated
intermediate-task data, but neither consistently outperforms simply performing
English intermediate-task training
- …