191 research outputs found
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems
High-level reasoning can be defined as the capability to generalize over
knowledge acquired via experience, and to exhibit robust behavior in novel
situations. Such form of reasoning is a basic skill in humans, who seamlessly
use it in a broad spectrum of tasks, from language communication to decision
making in complex situations. When it manifests itself in understanding and
manipulating the everyday world of objects and their interactions, we talk
about common sense or commonsense reasoning. State-of-the-art AI systems don't
possess such capability: for instance, Large Language Models have recently
become popular by demonstrating remarkable fluency in conversing with humans,
but they still make trivial mistakes when probed for commonsense competence; on
a different level, performance degradation outside training data prevents
self-driving vehicles to safely adapt to unseen scenarios, a serious and
unsolved problem that limits the adoption of such technology. In this paper we
propose to enable high-level reasoning in AI systems by integrating cognitive
architectures with external neuro-symbolic components. We illustrate a hybrid
framework centered on ACT-R and we discuss the role of generative models in
recent and future applications
Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering
Recent developments in pre-trained neural language modeling have led to leaps
in accuracy on commonsense question-answering benchmarks. However, there is
increasing concern that models overfit to specific tasks, without learning to
utilize external knowledge or perform general semantic reasoning. In contrast,
zero-shot evaluations have shown promise as a more robust measure of a model's
general reasoning abilities. In this paper, we propose a novel neuro-symbolic
framework for zero-shot question answering across commonsense tasks. Guided by
a set of hypotheses, the framework studies how to transform various
pre-existing knowledge resources into a form that is most effective for
pre-training models. We vary the set of language models, training regimes,
knowledge sources, and data generation strategies, and measure their impact
across tasks. Extending on prior work, we devise and compare four constrained
distractor-sampling strategies. We provide empirical results across five
commonsense question-answering tasks with data generated from five external
knowledge resources. We show that, while an individual knowledge graph is
better suited for specific tasks, a global knowledge graph brings consistent
gains across different tasks. In addition, both preserving the structure of the
task as well as generating fair and informative questions help language models
learn more effectively.Comment: AAAI 202
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Recent works such as VisProg and ViperGPT have smartly composed foundation
models for visual reasoning-using large language models (LLMs) to produce
programs that can be executed by pre-trained vision-language models. However,
they operate in limited domains, such as 2D images, not fully exploiting the
generalization of language: abstract concepts like "left" can also be grounded
in 3D, temporal, and action data, as in moving to your left. This limited
generalization stems from these inference-only methods' inability to learn or
adapt pre-trained models to a new domain. We propose the Logic-Enhanced
Foundation Model (LEFT), a unified framework that learns to ground and reason
with concepts across domains with a differentiable, domain-independent,
first-order logic-based program executor. LEFT has an LLM interpreter that
outputs a program represented in a general, logic-based reasoning language,
which is shared across all domains and tasks. LEFT's executor then executes the
program with trainable domain-specific grounding modules. We show that LEFT
flexibly learns concepts in four domains: 2D images, 3D scenes, human motions,
and robotic manipulation. It exhibits strong reasoning ability in a wide
variety of tasks, including those that are complex and not seen during
training, and can be easily applied to new domains.Comment: NeurIPS 2023. First two authors contributed equally. Project page:
https://web.stanford.edu/~joycj/projects/left_neurips_202
Large Language Models are Visual Reasoning Coordinators
Visual reasoning requires multimodal perception and commonsense cognition of
the world. Recently, multiple vision-language models (VLMs) have been proposed
with excellent commonsense reasoning ability in various domains. However, how
to harness the collective power of these complementary VLMs is rarely explored.
Existing methods like ensemble still struggle to aggregate these models with
the desired higher-order communications. In this work, we propose Cola, a novel
paradigm that coordinates multiple VLMs for visual reasoning. Our key insight
is that a large language model (LLM) can efficiently coordinate multiple VLMs
by facilitating natural language communication that leverages their distinct
and complementary capabilities. Extensive experiments demonstrate that our
instruction tuning variant, Cola-FT, achieves state-of-the-art performance on
visual question answering (VQA), outside knowledge VQA, visual entailment, and
visual spatial reasoning tasks. Moreover, we show that our in-context learning
variant, Cola-Zero, exhibits competitive performance in zero and few-shot
settings, without finetuning. Through systematic ablation studies and
visualizations, we validate that a coordinator LLM indeed comprehends the
instruction prompts as well as the separate functionalities of VLMs; it then
coordinates them to enable impressive visual reasoning capabilities.Comment: Accepted at NeurIPS 202
Recommended from our members
Computational Lexical Resources for Explainable Natural Language Understanding
Procedural texts describe dynamic state changes that occur during a step-by-step process (e.g. an instruction manual, photosynthesis, or a baking recipe). As a subtask of procedural text understanding, entity state tracking aims to automatically analyze such documents, identifying relevant information that allows entities’ states and locations to be tracked during a process. This NLP task suffers from the scarcity of annotated data, mainly because obtaining such annotations is difficult and time-consuming. For instance, annotators often rely on commonsense knowledge to annotate implicit information. Recent approaches have successfully incorporated external world knowledge. In particular, Zhang et al. (2021) [111] present a neuro-symbolic model, where commonsense knowledge about entities from ConceptNet is leveraged to guide the model. The model uses a BERT encoder fine-tuned on raw procedural texts to predict entity state changes. We re-implement this model as our baseline, and add linguistic knowledge to allow the model to have access to the lexical semantic information encoded in verbs, using VerbNet. We modify the multi-stage training method presented by [111], and compare the sources of knowledge in the LM fine-tuning step in different experimental settings. The evaluation results on the ProPara dataset [21] show improvements over the baseline, verifying the effectiveness of introducing event semantics over and above commonsense knowledge about entities. In addition, we develop a purely symbolic model for entity state tracking that uses a simple set of case statements, and is informed mostly by linguistic knowledge retrieved from various computational lexical resources. We show that our purely symbolic model is generalizable and explainable and achieves state-of-the-art results on the Recipes dataset [10].</p
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models (LLMs) have taken Knowledge Representation -- and the
world -- by storm. This inflection point marks a shift from explicit knowledge
representation to a renewed focus on the hybrid representation of both explicit
knowledge and parametric knowledge. In this position paper, we will discuss
some of the common debate points within the community on LLMs (parametric
knowledge) and Knowledge Graphs (explicit knowledge) and speculate on
opportunities and visions that the renewed focus brings, as well as related
research topics and challenges.Comment: 30 page
- …