166 research outputs found
Recycling Computed Answers in Rewrite Systems for Abduction
In rule-based systems, goal-oriented computations correspond naturally to the
possible ways that an observation may be explained. In some applications, we
need to compute explanations for a series of observations with the same domain.
The question whether previously computed answers can be recycled arises. A yes
answer could result in substantial savings of repeated computations. For
systems based on classic logic, the answer is YES. For nonmonotonic systems
however, one tends to believe that the answer should be NO, since recycling is
a form of adding information. In this paper, we show that computed answers can
always be recycled, in a nontrivial way, for the class of rewrite procedures
that we proposed earlier for logic programs with negation. We present some
experimental results on an encoding of the logistics domain.Comment: 20 pages. Full version of our IJCAI-03 pape
PocketNN: Integer-only Training and Inference of Neural Networks without Quantization via Direct Feedback Alignment and Pocket Activations in Pure C++
Standard deep learning algorithms are implemented using floating-point real
numbers. This presents an obstacle for implementing them on low-end devices
which may not have dedicated floating-point units (FPUs). As a result,
researchers in tinyML have considered machine learning algorithms that can
train and run a deep neural network (DNN) on a low-end device using integer
operations only. In this paper we propose PocketNN, a light and self-contained
proof-of-concept framework in pure C++ for the training and inference of DNNs
using only integers. Unlike other approaches, PocketNN directly operates on
integers without requiring any explicit quantization algorithms or customized
fixed-point formats. This was made possible by pocket activations, which are a
family of activation functions devised for integer-only DNNs, and an emerging
DNN training algorithm called direct feedback alignment (DFA). Unlike the
standard backpropagation (BP), DFA trains each layer independently, thus
avoiding integer overflow which is a key problem when using BP with
integer-only operations. We used PocketNN to train some DNNs on two well-known
datasets, MNIST and Fashion-MNIST. Our experiments show that the DNNs trained
with our PocketNN achieved 96.98% and 87.7% accuracies on MNIST and
Fashion-MNIST datasets, respectively. The accuracies are very close to the
equivalent DNNs trained using BP with floating-point real number operations,
such that accuracy degradations were just 1.02%p and 2.09%p, respectively.
Finally, our PocketNN has high compatibility and portability for low-end
devices as it is open source and implemented in pure C++ without any
dependencies.Comment: Accepted in tinyML Research Symposium '22, March 2022, San Jose, CA
(TinyML 2022). 7 pages, 4 figures, 2 tables. [v5] title is modifie
Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts
Traditional model-based reinforcement learning (RL) methods generate forward
rollout traces using the learnt dynamics model to reduce interactions with the
real environment. The recent model-based RL method considers the way to learn a
backward model that specifies the conditional probability of the previous state
given the previous action and the current state to additionally generate
backward rollout trajectories. However, in this type of model-based method, the
samples derived from backward rollouts and those from forward rollouts are
simply aggregated together to optimize the policy via the model-free RL
algorithm, which may decrease both the sample efficiency and the convergence
rate. This is because such an approach ignores the fact that backward rollout
traces are often generated starting from some high-value states and are
certainly more instructive for the agent to improve the behavior. In this
paper, we propose the backward imitation and forward reinforcement learning
(BIFRL) framework where the agent treats backward rollout traces as expert
demonstrations for the imitation of excellent behaviors, and then collects
forward rollout transitions for policy reinforcement. Consequently, BIFRL
empowers the agent to both reach to and explore from high-value states in a
more efficient manner, and further reduces the real interactions, making it
potentially more suitable for real-robot learning. Moreover, a
value-regularized generative adversarial network is introduced to augment the
valuable states which are infrequently received by the agent. Theoretically, we
provide the condition where BIFRL is superior to the baseline methods.
Experimentally, we demonstrate that BIFRL acquires the better sample efficiency
and produces the competitive asymptotic performance on various MuJoCo
locomotion tasks compared against state-of-the-art model-based methods.Comment: Accepted by IROS202
Computer-aided proofs of Arrow’s and other impossibility theorems
Arrow’s Impossibility Theorem is one of the landmark results in social choice theory. Over the years since the theorem was proved in 1950, quite a few alternative proofs have been put forward. In this paper, we propose yet another alternative proof of the theorem. The basic idea is to use induction to reduce the theorem to the base case with 3 alternatives and 2 agents and then use computers to verify the base case. This turns out to be an effective approach for proving other impossibility theorems such as Sen’s and Muller-Satterthwaite’s theorems as well. Furthermore, we believe this new proof opens an exciting prospect of using computers to discover similar impossibility or even possibility results
On Computing Universal Plans for Partially Observable Multi-Agent Path Finding
Multi-agent routing problems have drawn significant attention nowadays due to
their broad industrial applications in, e.g., warehouse robots, logistics
automation, and traffic control. Conventionally, they are modelled as classical
planning problems. In this paper, we argue that it is beneficial to formulate
them as universal planning problems. We therefore propose universal plans, also
known as policies, as the solution concepts, and implement a system called
ASP-MAUPF (Answer Set Programming for Multi-Agent Universal Plan Finding) for
computing them. Given an arbitrary two-dimensional map and a profile of goals
for the agents, the system finds a feasible universal plan for each agent that
ensures no collision with others. We use the system to conduct some
experiments, and make some observations on the types of goal profiles and
environments that will have feasible policies, and how they may depend on
agents' sensors. We also demonstrate how users can customize action preferences
to compute more efficient policies, even (near-)optimal ones
- …