282 research outputs found
Computing Loops With at Most One External Support Rule
If a loop has no external support rules, then its loop formula is equivalent to a set of unit clauses; and if it has exactly one external support rule, then its loop formula is equivalent to a set of binary clauses. In this paper, we consider how to compute these loops and their loop formulas in a normal logic program, and use them to derive consequences of a logic program. We show that an iterative procedure based on unit propagation, the program completion and the loop formulas of loops with no external support rules can compute the same consequences as the “Expand ” operator in smodels, which is known to compute the well-founded model when the given normal logic program has no constraints. We also show that using the loop formulas of loops with at most one external support rule, the same procedure can compute more consequences, and these extra consequences can help ASP solvers such as cmodels to find answer sets of certain logic programs
Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts
Traditional model-based reinforcement learning (RL) methods generate forward
rollout traces using the learnt dynamics model to reduce interactions with the
real environment. The recent model-based RL method considers the way to learn a
backward model that specifies the conditional probability of the previous state
given the previous action and the current state to additionally generate
backward rollout trajectories. However, in this type of model-based method, the
samples derived from backward rollouts and those from forward rollouts are
simply aggregated together to optimize the policy via the model-free RL
algorithm, which may decrease both the sample efficiency and the convergence
rate. This is because such an approach ignores the fact that backward rollout
traces are often generated starting from some high-value states and are
certainly more instructive for the agent to improve the behavior. In this
paper, we propose the backward imitation and forward reinforcement learning
(BIFRL) framework where the agent treats backward rollout traces as expert
demonstrations for the imitation of excellent behaviors, and then collects
forward rollout transitions for policy reinforcement. Consequently, BIFRL
empowers the agent to both reach to and explore from high-value states in a
more efficient manner, and further reduces the real interactions, making it
potentially more suitable for real-robot learning. Moreover, a
value-regularized generative adversarial network is introduced to augment the
valuable states which are infrequently received by the agent. Theoretically, we
provide the condition where BIFRL is superior to the baseline methods.
Experimentally, we demonstrate that BIFRL acquires the better sample efficiency
and produces the competitive asymptotic performance on various MuJoCo
locomotion tasks compared against state-of-the-art model-based methods.Comment: Accepted by IROS202
PocketNN: Integer-only Training and Inference of Neural Networks without Quantization via Direct Feedback Alignment and Pocket Activations in Pure C++
Standard deep learning algorithms are implemented using floating-point real
numbers. This presents an obstacle for implementing them on low-end devices
which may not have dedicated floating-point units (FPUs). As a result,
researchers in tinyML have considered machine learning algorithms that can
train and run a deep neural network (DNN) on a low-end device using integer
operations only. In this paper we propose PocketNN, a light and self-contained
proof-of-concept framework in pure C++ for the training and inference of DNNs
using only integers. Unlike other approaches, PocketNN directly operates on
integers without requiring any explicit quantization algorithms or customized
fixed-point formats. This was made possible by pocket activations, which are a
family of activation functions devised for integer-only DNNs, and an emerging
DNN training algorithm called direct feedback alignment (DFA). Unlike the
standard backpropagation (BP), DFA trains each layer independently, thus
avoiding integer overflow which is a key problem when using BP with
integer-only operations. We used PocketNN to train some DNNs on two well-known
datasets, MNIST and Fashion-MNIST. Our experiments show that the DNNs trained
with our PocketNN achieved 96.98% and 87.7% accuracies on MNIST and
Fashion-MNIST datasets, respectively. The accuracies are very close to the
equivalent DNNs trained using BP with floating-point real number operations,
such that accuracy degradations were just 1.02%p and 2.09%p, respectively.
Finally, our PocketNN has high compatibility and portability for low-end
devices as it is open source and implemented in pure C++ without any
dependencies.Comment: Accepted in tinyML Research Symposium '22, March 2022, San Jose, CA
(TinyML 2022). 7 pages, 4 figures, 2 tables. [v5] title is modifie
- …
