Search CORE

1,007 research outputs found

Zero-Shot Learning of Language Models for Describing Human Actions Based on Semantic Compositionality of Actions

Author: Asoh Hideki
Kobayashi Ichiro
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Programmable Agents

Author: Denil Misha
Colmenarejo Sergio Gómez
Cabi Serkan
Saxton David
de Freitas Nando
Publication venue
Publication date: 01/01/1959
Field of study

We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

Author: Cascante-Bonilla Paola
Doveh Sivan
Feris Rogerio
Karlinsky Leonid
Kim Donghyun
Oliva Aude
Ordonez Vicente
Panda Rameswar
Shehada Khaled
Smith James Seale
Varol Gül
Publication venue
Publication date: 30/08/2023
Field of study

Large-scale pre-trained Vision & Language (VL) models have shown remarkable performance in many applications, enabling replacing a fixed set of supported classes with zero-shot open vocabulary reasoning over (almost arbitrary) natural language prompts. However, recent works have uncovered a fundamental weakness of these models. For example, their difficulty to understand Visual Language Concepts (VLC) that go 'beyond nouns' such as the meaning of non-object words (e.g., attributes, actions, relations, states, etc.), or difficulty in performing compositional reasoning such as understanding the significance of the order of the words in a sentence. In this work, we investigate to which extent purely synthetic data could be leveraged to teach these models to overcome such shortcomings without compromising their zero-shot capabilities. We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models. Additionally, we propose a general VL finetuning strategy for effectively leveraging SyViC towards achieving these improvements. Our extensive experiments and ablations on VL-Checklist, Winoground, and ARO benchmarks demonstrate that it is possible to adapt strong pre-trained VL models with synthetic data significantly enhancing their VLC understanding (e.g. by 9.9% on ARO and 4.3% on VL-Checklist) with under 1% drop in their zero-shot accuracy.Comment: Accepted to ICCV 2023. Project page: https://synthetic-vic.github.io

arXiv.org e-Print Archive

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Author: Cinbis Ramazan Gokberk
Demirel Berkan
Ikizler-Cinbis Nazli
Publication venue
Publication date: 01/01/2017
Field of study

We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves to be a difficult task due to dominance of non-visual semantics in underlying vector-space embeddings of class names. To address this issue, we discriminatively learn a word representation such that the similarities between class and combination of attribute names fall in line with the visual similarity. Contrary to the traditional zero-shot learning approaches that are built upon attribute presence, our approach bypasses the laborious attribute-class relation annotations for unseen classes. In addition, our proposed approach renders text-only training possible, hence, the training can be augmented without the need to collect additional image data. The experimental results show that our method yields state-of-the-art results for unsupervised ZSL in three benchmark datasets.Comment: To appear at IEEE Int. Conference on Computer Vision (ICCV) 201

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

A Short Survey of Systematic Generalization

Author: Li Yuanpeng
Publication venue
Publication date: 21/11/2022
Field of study

This survey includes systematic generalization and a history of how machine learning addresses it. We aim to summarize and organize the related information of both conventional and recent improvements. We first look at the definition of systematic generalization, then introduce Classicist and Connectionist. We then discuss different types of Connectionists and how they approach the generalization. Two crucial problems of variable binding and causality are discussed. We look into systematic generalization in language, vision, and VQA fields. Recent improvements from different aspects are discussed. Systematic generalization has a long history in artificial intelligence. We could cover only a small portion of many contributions. We hope this paper provides a background and is beneficial for discoveries in future work

arXiv.org e-Print Archive

Sherlock: Scalable Fact Learning in Images

Author: Chang Walter
Cohen Scott
Elgammal Ahmed
Elhoseiny Mohamed
Price Brian
Publication venue
Publication date: 02/04/2016
Field of study

We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g.,

), (2) attributes (e.g.,

), (3) actions (e.g.,

), and (4) interactions (e.g.,

). Each fact has a semantic language view (e.g.,

) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce two learning representation models as potential baselines. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of more than 202,000 facts and 814,000 images. Our experiments show the advantage of relating facts by the structure by the proposed models compared to the designed baselines on bidirectional fact retrieval.Comment: Jan 7 Updat

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications