Search CORE

191 research outputs found

Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems

Author: Oltramari Alessandro
Publication venue
Publication date: 13/11/2023
Field of study

High-level reasoning can be defined as the capability to generalize over knowledge acquired via experience, and to exhibit robust behavior in novel situations. Such form of reasoning is a basic skill in humans, who seamlessly use it in a broad spectrum of tasks, from language communication to decision making in complex situations. When it manifests itself in understanding and manipulating the everyday world of objects and their interactions, we talk about common sense or commonsense reasoning. State-of-the-art AI systems don't possess such capability: for instance, Large Language Models have recently become popular by demonstrating remarkable fluency in conversing with humans, but they still make trivial mistakes when probed for commonsense competence; on a different level, performance degradation outside training data prevents self-driving vehicles to safely adapt to unseen scenarios, a serious and unsolved problem that limits the adoption of such technology. In this paper we propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications

arXiv.org e-Print Archive

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Author: Bisk Yonatan
Francis Jonathan
Ilievski Filip
Ma Kaixin
Nyberg Eric
Oltramari Alessandro
Publication venue
Publication date: 14/12/2020
Field of study

Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

What's Left? Concept Grounding with Logic-Enhanced Foundation Models

Author: Hsu Joy
Mao Jiayuan
Tenenbaum Joshua B.
Wu Jiajun
Publication venue
Publication date: 24/10/2023
Field of study

Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal, and action data, as in moving to your left. This limited generalization stems from these inference-only methods' inability to learn or adapt pre-trained models to a new domain. We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor. LEFT has an LLM interpreter that outputs a program represented in a general, logic-based reasoning language, which is shared across all domains and tasks. LEFT's executor then executes the program with trainable domain-specific grounding modules. We show that LEFT flexibly learns concepts in four domains: 2D images, 3D scenes, human motions, and robotic manipulation. It exhibits strong reasoning ability in a wide variety of tasks, including those that are complex and not seen during training, and can be easily applied to new domains.Comment: NeurIPS 2023. First two authors contributed equally. Project page: https://web.stanford.edu/~joycj/projects/left_neurips_202

arXiv.org e-Print Archive

Large Language Models are Visual Reasoning Coordinators

Author: Chen Liangyu
Darrell Trevor
Keutzer Kurt
Li Bo
Li Chunyuan
Liu Ziwei
Shen Sheng
Yang Jingkang
Publication venue
Publication date: 23/10/2023
Field of study

Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the collective power of these complementary VLMs is rarely explored. Existing methods like ensemble still struggle to aggregate these models with the desired higher-order communications. In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. Our key insight is that a large language model (LLM) can efficiently coordinate multiple VLMs by facilitating natural language communication that leverages their distinct and complementary capabilities. Extensive experiments demonstrate that our instruction tuning variant, Cola-FT, achieves state-of-the-art performance on visual question answering (VQA), outside knowledge VQA, visual entailment, and visual spatial reasoning tasks. Moreover, we show that our in-context learning variant, Cola-Zero, exhibits competitive performance in zero and few-shot settings, without finetuning. Through systematic ablation studies and visualizations, we validate that a coordinator LLM indeed comprehends the instruction prompts as well as the separate functionalities of VLMs; it then coordinates them to enable impressive visual reasoning capabilities.Comment: Accepted at NeurIPS 202

arXiv.org e-Print Archive

Recommended from our members

Computational Lexical Resources for Explainable Natural Language Understanding

Author: Kazeminejad Ghazaleh
Publication venue: University of Colorado Boulder
Publication date: 01/04/2023
Field of study

  Procedural texts describe dynamic state changes that occur during a step-by-step process (e.g. an instruction manual, photosynthesis, or a baking recipe). As a subtask of procedural text understanding, entity state tracking aims to automatically analyze such documents, identifying relevant information that allows entities’ states and locations to be tracked during a process. This NLP task suffers from the scarcity of annotated data, mainly because obtaining such annotations is difficult and time-consuming. For instance, annotators often rely on commonsense knowledge to annotate implicit information. Recent approaches have successfully incorporated external world knowledge. In particular, Zhang et al. (2021) [111] present a neuro-symbolic model, where commonsense knowledge about entities from ConceptNet is leveraged to guide the model. The model uses a BERT encoder fine-tuned on raw procedural texts to predict entity state changes. We re-implement this model as our baseline, and add linguistic knowledge to allow the model to have access to the lexical semantic information encoded in verbs, using VerbNet. We modify the multi-stage training method presented by [111], and compare the sources of knowledge in the LM fine-tuning step in different experimental settings. The evaluation results on the ProPara dataset [21] show improvements over the baseline, verifying the effectiveness of introducing event semantics over and above commonsense knowledge about entities. In addition, we develop a purely symbolic model for entity state tracking that uses a simple set of case statements, and is informed mostly by linguistic knowledge retrieved from various computational lexical resources. We show that our purely symbolic model is generalizable and explainable and achieves state-of-the-art results on the Recipes dataset [10].</p

CU Scholar Institutional Repository

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Author: Biswas Russa
Bonifati Angela
Chen Jiaoyan
de Melo Gerard
Dietze Stefan
Dragoni Mauro
Graux Damien
Jabeen Hajira
Kalo Jan-Christoph
Lissandrini Matteo
Omeliyanenko Janna
Pan Jeff Z.
Razniewski Simon
Singhania Sneha
Vakaj Edlira
Zhang Wen
Publication venue
Publication date: 11/08/2023
Field of study

Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.Comment: 30 page

arXiv.org e-Print Archive