94 research outputs found
Estimating Treatment Effects using Neurosymbolic Program Synthesis
Estimating treatment effects from observational data is a central problem in
causal inference. Methods to solve this problem exploit inductive biases and
heuristics from causal inference to design multi-head neural network
architectures and regularizers. In this work, we propose to use neurosymbolic
program synthesis, a data-efficient, and interpretable technique, to solve the
treatment effect estimation problem. We theoretically show that neurosymbolic
programming can solve the treatment effect estimation problem. By designing a
Domain Specific Language (DSL) for treatment effect estimation problem based on
the inductive biases used in literature, we argue that neurosymbolic
programming is a better alternative to treatment effect estimation than
traditional methods. Our empirical study reveals that our method, which
implicitly encodes inductive biases in a DSL, achieves better performance on
benchmark datasets than the state-of-the-art methods.Comment: Preprin
From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey
This survey explores the integration of learning and reasoning in two
different fields of artificial intelligence: neurosymbolic and statistical
relational artificial intelligence. Neurosymbolic artificial intelligence
(NeSy) studies the integration of symbolic reasoning and neural networks, while
statistical relational artificial intelligence (StarAI) focuses on integrating
logic with probabilistic graphical models. This survey identifies seven shared
dimensions between these two subfields of AI. These dimensions can be used to
characterize different NeSy and StarAI systems. They are concerned with (1) the
approach to logical inference, whether model or proof-based; (2) the syntax of
the used logical theories; (3) the logical semantics of the systems and their
extensions to facilitate learning; (4) the scope of learning, encompassing
either parameter or structure learning; (5) the presence of symbolic and
subsymbolic representations; (6) the degree to which systems capture the
original logic, probabilistic, and neural paradigms; and (7) the classes of
learning tasks the systems are applied to. By positioning various NeSy and
StarAI systems along these dimensions and pointing out similarities and
differences between them, this survey contributes fundamental concepts for
understanding the integration of learning and reasoning.Comment: To appear in Artificial Intelligence. Shorter version at IJCAI 2020
survey track, https://www.ijcai.org/proceedings/2020/0688.pd
Neurosymbolic Programming for Science
Neurosymbolic Programming (NP) techniques have the potential to accelerate
scientific discovery. These models combine neural and symbolic components to
learn complex patterns and representations from data, using high-level concepts
or known constraints. NP techniques can interface with symbolic domain
knowledge from scientists, such as prior knowledge and experimental context, to
produce interpretable outputs. We identify opportunities and challenges between
current NP models and scientific workflows, with real-world examples from
behavior analysis in science: to enable the use of NP broadly for workflows
across the natural and social sciences.Comment: Neural Information Processing Systems 2022 - AI for science worksho
Neural Machine Translation for Code Generation
Neural machine translation (NMT) methods developed for natural language
processing have been shown to be highly successful in automating translation
from one natural language to another. Recently, these NMT methods have been
adapted to the generation of program code. In NMT for code generation, the task
is to generate output source code that satisfies constraints expressed in the
input. In the literature, a variety of different input scenarios have been
explored, including generating code based on natural language description,
lower-level representations such as binary or assembly (neural decompilation),
partial representations of source code (code completion and repair), and source
code in another language (code translation). In this paper we survey the NMT
for code generation literature, cataloging the variety of methods that have
been explored according to input and output representations, model
architectures, optimization techniques used, data sets, and evaluation methods.
We discuss the limitations of existing methods and future research directionsComment: 33 pages, 1 figur
Neurosymbolic Reinforcement Learning and Planning: A Survey
The area of Neurosymbolic Artificial Intelligence (Neurosymbolic AI) is
rapidly developing and has become a popular research topic, encompassing
sub-fields such as Neurosymbolic Deep Learning (Neurosymbolic DL) and
Neurosymbolic Reinforcement Learning (Neurosymbolic RL). Compared to
traditional learning methods, Neurosymbolic AI offers significant advantages by
simplifying complexity and providing transparency and explainability.
Reinforcement Learning(RL), a long-standing Artificial Intelligence(AI) concept
that mimics human behavior using rewards and punishment, is a fundamental
component of Neurosymbolic RL, a recent integration of the two fields that has
yielded promising results. The aim of this paper is to contribute to the
emerging field of Neurosymbolic RL by conducting a literature survey. Our
evaluation focuses on the three components that constitute Neurosymbolic RL:
neural, symbolic, and RL. We categorize works based on the role played by the
neural and symbolic parts in RL, into three taxonomies:Learning for Reasoning,
Reasoning for Learning and Learning-Reasoning. These categories are further
divided into sub-categories based on their applications. Furthermore, we
analyze the RL components of each research work, including the state space,
action space, policy module, and RL algorithm. Additionally, we identify
research opportunities and challenges in various applications within this
dynamic field.Comment: 16 pages, 9 figures, IEEE Transactions on Artificial Intelligenc
Guess & Sketch: Language Model Guided Transpilation
Maintaining legacy software requires many software and systems engineering
hours. Assembly code programs, which demand low-level control over the computer
machine state and have no variable names, are particularly difficult for humans
to analyze. Existing conventional program translators guarantee correctness,
but are hand-engineered for the source and target programming languages in
question. Learned transpilation, i.e. automatic translation of code, offers an
alternative to manual re-writing and engineering efforts. Automated symbolic
program translation approaches guarantee correctness but struggle to scale to
longer programs due to the exponentially large search space. Their rigid
rule-based systems also limit their expressivity, so they can only reason about
a reduced space of programs. Probabilistic neural language models (LMs) produce
plausible outputs for every input, but do so at the cost of guaranteed
correctness. In this work, we leverage the strengths of LMs and symbolic
solvers in a neurosymbolic approach to learned transpilation for assembly code.
Assembly code is an appropriate setting for a neurosymbolic approach, since
assembly code can be divided into shorter non-branching basic blocks amenable
to the use of symbolic methods. Guess & Sketch extracts alignment and
confidence information from features of the LM then passes it to a symbolic
solver to resolve semantic equivalence of the transpilation input and output.
We test Guess & Sketch on three different test sets of assembly transpilation
tasks, varying in difficulty, and show that it successfully transpiles 57.6%
more examples than GPT-4 and 39.6% more examples than an engineered transpiler.
We also share a training and evaluation dataset for this task
Scallop: A Language for Neurosymbolic Programming
We present Scallop, a language which combines the benefits of deep learning
and logical reasoning. Scallop enables users to write a wide range of
neurosymbolic applications and train them in a data- and compute-efficient
manner. It achieves these goals through three key features: 1) a flexible
symbolic representation that is based on the relational data model; 2) a
declarative logic programming language that is based on Datalog and supports
recursion, aggregation, and negation; and 3) a framework for automatic and
efficient differentiable reasoning that is based on the theory of provenance
semirings. We evaluate Scallop on a suite of eight neurosymbolic applications
from the literature. Our evaluation demonstrates that Scallop is capable of
expressing algorithmic reasoning in diverse and challenging AI tasks, provides
a succinct interface for machine learning programmers to integrate logical
domain knowledge, and yields solutions that are comparable or superior to
state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions
outperform these models in aspects such as runtime and data efficiency,
interpretability, and generalizability
Neurosymbolic Grounding for Compositional World Models
We introduce Cosmos, a framework for object-centric world modeling that is
designed for compositional generalization (CG), i.e., high performance on
unseen input scenes obtained through the composition of known visual "atoms."
The central insight behind Cosmos is the use of a novel form of neurosymbolic
grounding. Specifically, the framework introduces two new tools: (i)
neurosymbolic scene encodings, which represent each entity in a scene using a
real vector computed using a neural encoder, as well as a vector of composable
symbols describing attributes of the entity, and (ii) a neurosymbolic attention
mechanism that binds these entities to learned rules of interaction. Cosmos is
end-to-end differentiable; also, unlike traditional neurosymbolic methods that
require representations to be manually mapped to symbols, it computes an
entity's symbolic attributes using vision-language foundation models. Through
an evaluation that considers two different forms of CG on an established
blocks-pushing domain, we show that the framework establishes a new
state-of-the-art for CG in world modeling
Natural Language Commanding via Program Synthesis
We present Semantic Interpreter, a natural language-friendly AI system for
productivity software such as Microsoft Office that leverages large language
models (LLMs) to execute user intent across application features. While LLMs
are excellent at understanding user intent expressed as natural language, they
are not sufficient for fulfilling application-specific user intent that
requires more than text-to-text transformations. We therefore introduce the
Office Domain Specific Language (ODSL), a concise, high-level language
specialized for performing actions in and interacting with entities in Office
applications. Semantic Interpreter leverages an Analysis-Retrieval prompt
construction method with LLMs for program synthesis, translating natural
language user utterances to ODSL programs that can be transpiled to application
APIs and then executed. We focus our discussion primarily on a research
exploration for Microsoft PowerPoint
- …