236 research outputs found
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
We consider infinite-horizon stationary -discounted Markov Decision
Processes, for which it is known that there exists a stationary optimal policy.
Using Value and Policy Iteration with some error at each iteration,
it is well-known that one can compute stationary policies that are
-optimal. After arguing that this
guarantee is tight, we develop variations of Value and Policy Iteration for
computing non-stationary policies that can be up to
-optimal, which constitutes a significant
improvement in the usual situation when is close to 1. Surprisingly,
this shows that the problem of "computing near-optimal non-stationary policies"
is much simpler than that of "computing near-optimal stationary policies"
KBGAN: Adversarial Learning for Knowledge Graph Embeddings
We introduce KBGAN, an adversarial learning framework to improve the
performances of a wide range of existing knowledge graph embedding models.
Because knowledge graphs typically only contain positive facts, sampling useful
negative training examples is a non-trivial task. Replacing the head or tail
entity of a fact with a uniformly randomly selected entity is a conventional
method for generating negative facts, but the majority of the generated
negative facts can be easily discriminated from positive facts, and will
contribute little towards the training. Inspired by generative adversarial
networks (GANs), we use one knowledge graph embedding model as a negative
sample generator to assist the training of our desired model, which acts as the
discriminator in GANs. This framework is independent of the concrete form of
generator and discriminator, and therefore can utilize a wide variety of
knowledge graph embedding models as its building blocks. In experiments, we
adversarially train two translation-based models, TransE and TransD, each with
assistance from one of the two probability-based models, DistMult and ComplEx.
We evaluate the performances of KBGAN on the link prediction task, using three
knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental
results show that adversarial training substantially improves the performances
of target embedding models under various settings.Comment: To appear at NAACL HLT 201
RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents
We introduce RLang, a domain-specific language (DSL) for communicating domain
knowledge to an RL agent. Unlike existing RL DSLs that ground to
\textit{single} elements of a decision-making formalism (e.g., the reward
function or policy), RLang can specify information about every element of a
Markov decision process. We define precise syntax and grounding semantics for
RLang, and provide a parser that grounds RLang programs to an
algorithm-agnostic \textit{partial} world model and policy that can be
exploited by an RL agent. We provide a series of example RLang programs
demonstrating how different RL methods can exploit the resulting knowledge,
encompassing model-free and model-based tabular algorithms, policy gradient and
value-based methods, hierarchical approaches, and deep methods
- …