Search CORE

115 research outputs found

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

Author: Doya Kenji
Uchibe Eiji
Publication venue
Publication date: 17/08/2020
Field of study

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure

arXiv.org e-Print Archive

OIST Institutional Repository

Constrained Reinforcement Learning from Intrinsic and Extrinsic Rewards

Author: Eiji Uchibe
Kenji Doya
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Author: Eiji Uchibe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2017
Field of study

This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

発生期歯胚において発現する転写制御関連遺伝子群の同定とその発現パターン解析

Author: Uchibe Kenta
Publication venue
Publication date: 25/03/2009
Field of study

Okayama University Scientific Achievement Repository

Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules

Author: Eiji Uchibe
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

This paper proposes Cooperative and competitive Reinforcement And Imitation Learning (CRAIL) for selecting an appropriate policy from a set of multiple heterogeneous modules and training all of them in parallel. Each learning module has its own network architecture and improves the policy based on an off-policy reinforcement learning algorithm and behavior cloning from samples collected by a behavior policy that is constructed by a combination of all the policies. Since the mixing weights are determined by the performance of the module, a better policy is automatically selected based on the learning progress. Experimental results on a benchmark control task show that CRAIL successfully achieves fast learning by allowing modules with complicated network structures to exploit task-relevant samples for training

Directory of Open Access Journals

Frontiers - Publisher Connector

Behavior generation for a mobile robot based on the adaptive fitness function

Author: Asada
Eiji Uchibe
Fonseca
Masakazu Yanase
Minoru Asada
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Author: Eiji Uchibe
Kenji Doya
Stefan Elfwing
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro\u27s TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 x 10 board, using TD(lambda) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(lambda) agent with SiLU and dSiLU hidden units

arXiv.org e-Print Archive

OIST Institutional Repository

Organoids with cancer stem cell-like properties secrete exosomes and HSP90 in a 3D nanoenvironment

Author: Arai Kazuya
Calderwood Stuart K.
Eguchi Takanori
Fujiwara Toshifumi
Iinuma Ryosuke
Itoh Manabu
Kozaki Kenichi
Murakami Jun
Murata Yoshiki
Nakano Keisuke
Nakatsura Tetsuya
Namba Yuri
Ohyama Kazumi
Okamoto Kuniaki
Okamura Hirohiko
Okusha Yuka
Ono Kisho
Shimomura Manami
Sogawa Chiharu
Takigawa Masaharu
Uchibe Kenta
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Ability to form cellular aggregations such as tumorspheres and spheroids have been used as a morphological marker of malignant cancer cells and in particular cancer stem cells (CSC). However, the common definition of the types of cellular aggregation formed by cancer cells has not been available. We examined morphologies of 67 cell lines cultured on three dimensional morphology enhancing NanoCulture Plates (NCP) and classified the types of cellular aggregates that form. Among the 67 cell lines, 49 cell lines formed spheres or spheroids, 8 cell lines formed grape-like aggregation (GLA), 8 cell lines formed other types of aggregation, and 3 cell lines formed monolayer sheets. Seven GLA-forming cell lines were derived from adenocarcinoma among the 8 lines. A neuroendocrine adenocarcinoma cell line PC-3 formed asymmetric GLA with ductal structures on the NCPs and rapidly growing asymmetric tumors that metastasized to lymph nodes in immunocompromised mice. In contrast, another adenocarcinoma cell line DU-145 formed spheroids in vitro and spheroid-like tumors in vivo that did not metastasize to lymph nodes until day 50 after transplantation. Culture in the 3D nanoenvironment and in a defined stem cell medium enabled the neuroendocrine adenocarcinoma cells to form slowly growing large organoids that expressed multiple stem cell markers, neuroendocrine markers, intercellular adhesion molecules, and oncogenes in vitro. In contrast, the more commonly used 2D serum-contained environment reduced intercellular adhesion and induced mesenchymal transition and promoted rapid growth of the cells. In addition, the 3D stemness nanoenvironment promoted secretion of HSP90 and EpCAM-exosomes, a marker of CSC phenotype, from the neuroendocrine organoids. These findings indicate that the NCP-based 3D environment enables cells to form stem cell tumoroids with multipotency and model more accurately the in vivo tumor status at the levels of morphology and gene expression

Harvard University - DASH

Directory of Open Access Journals

Okayama University Scientific Achievement Repository

「患者相談室」看護師の役割と課題

Author: Takako UCHIBE
内部孝子
坂本幸子
村上則子
河瀬裕子
脇田和子
Publication venue: 日本赤十字社医学会
Publication date: 01/09/2014
Field of study

Japanese Red Cross Repository / 赤十字リポジトリ