63 research outputs found

    Neural Q-learning for solving PDEs

    Get PDF
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. To solve PDEs with Dirichlet boundary condition, our “Q-PDE” algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the QPDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units → ∞. For monotone PDEs (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, strongly converges in L 2 to the PDE solution as the training time → ∞. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs

    Neural Q-learning for solving elliptic PDEs

    Full text link
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units →∞\rightarrow \infty. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in L2L^2 to the PDE solution as the training time →∞\rightarrow \infty. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs

    An Improved Neural Q-learning Approach for Dynamic Path Planning of Mobile Robots

    Get PDF
    Dynamic path planning is an important task for mobile robots in complex and uncertain environments. This paper proposes an improved neural Q-learning (INQL) approach for dynamic path planning of mobile robots. In the proposed INQL approach, the reward function is designed based on a bio-inspired neural network so that the rate of convergence of INQL is improved compared with the Q-learning algorithm. In addition, by combining the INQL algorithm with cubic B-spline curves, the robot can move to the target position successfully via a feasible planned path in different dynamic environments. Simulation and experimental results illustrate the superior of the INQL-based path planning method compared with existing popular path planning methods

    EMBEDDED LEARNING ROBOT WITH FUZZY Q-LEARNING FOR OBSTACLE AVOIDANCE BEHAVIOR

    Get PDF
    Fuzzy Q-learning is extending of Q-learning algorithm that uses fuzzy inference system to enable Q-learning holding continuous action and state. This learning has been implemented in various robot learning application like obstacle avoidance and target searching. However, most of them have not been realized in embedded robot. This paper presents implementation of fuzzy Q-learning for obstacle avoidance navigation in embedded mobile robot. The experimental result demonstrates that fuzzy Q-learning enables robot to be able to learn the right policy i.e. to avoid obstacle

    BEHAVIOR BASED CONTROL AND FUZZY Q-LEARNING FOR AUTONOMOUS FIVE LEGS ROBOT NAVIGATION

    Get PDF
    This paper presents collaboration of behavior based control and fuzzy Q-learning for five legs robot navigation systems. There are many fuzzy Q-learning algorithms that have been proposed to yield individual behavior like obstacle avoidance, find target and so on. However, for complicated tasks, it is needed to combine all behaviors in one control schema using behavior based control. Based this fact, this paper proposes a control schema that incorporate fuzzy q-learning in behavior based schema to overcome complicated tasks in navigation systems of autonomous five legs robot. In the proposed schema, there are two behaviors which is learned by fuzzy q-learning. Other behaviors is constructed in design step. All behaviors are coordinated by hierarchical hybrid coordination node. Simulation results demonstrate that the robot with proposed schema is able to learn the right policy, to avoid obstacle and to find the target. However, Fuzzy q-learning failed to give right policy for the robot to avoid collision in the corner location. Keywords : behavior based control, fuzzy q-learnin

    How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

    Full text link
    Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

    A Comparison of the Performance of Neural Q-learning and Soar-RL on a Derivative of the Block Design (BD)/Block Design Multiple Choice (BDMC) Subtests on the WISC-IV Intelligence Test

    Get PDF
    Teaching an autonomous agent to perform tasks that are simple to humans can be complex, especially when the task requires successive steps, has a low likelihood of successful completion with a brute force approach, and when the solution space is too large or too complex to be explicitly encoded. Reinforcement learning algorithms are particularly suited to such situations, and are based on rewards that help the agent to find the optimal action to execute given a certain state. The task investigated in this thesis is a modified form of the Block Design (BD) and Block Design Multiple Choice (BDMC) subtests, used by the Fourth Edition of the Wechsler Intelligence Scale for Children (WISC-IV) to partially assess childrens\u27 learning abilities. This thesis investigates the implementation, training, and performance of two reinforcement learning architectures for this problem: Soar-RL, a production system capable of reinforcement learning, and a Q-learning neural network. The objective is to help define the advantages and disadvantages of solving problems using these architectures. This thesis will show that Soar is intuitive for implementation and is able to find an optimal policy, although it is limited by its execution of exploratory actions. The neural network is also able to find an optimal policy and outperforms Soar, but the convergence of the solution is highly dependent on the architecture of the neural network
    • …
    corecore