12 research outputs found

    Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

    Full text link
    Dexterous manipulation tasks usually have multiple objectives, and the priorities of these objectives may vary at different phases of a manipulation task. Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method. To solve this problem, we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives. The AHRM can determine the objective priorities during the learning process and update the reward hierarchy to adapt to the changing objective priorities at different phases. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method improved robot learning in task performance and learning efficiency.Comment: Accepted by the Journal of Intelligent & Robotic System

    Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

    Get PDF
    Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nev-ertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms

    PENGARUH ORIENTASI KEWIRAUSAHAAN DAN ORIENTASI PEMBELAJARAN TERHADAP KINERJA PEMASARAN MELALUI ORIENTASI PELANGGAN DAN ORIENTASI INOVASI UKM KONVEKSI DI KABUPATEN SRAGEN

    Get PDF
    Konstribusi UKM pada perekonomian yang sangat besar dan perkembangan yang  semakin  meningkat  dari  segi  kuantitas   ternyata   belum   diimbangi  dengan peningkatan kualitas, sehingga tujuan penelitian ini adalah mengetahui pengaruh orientasi kewirausahaan dan orientasi pembelajaran terhadap kinerja pemasaran melalui orientasi pelanggan dan orientasi inovasi UKM Konveksi di Kabupaten Sragen. Populasi penelitian ini adalah pemilik atau  pengelola UKM konveksi di Kabupaten Sragen. Kriteria  pemilik atau pengelola UKM yang dijadikan populasi dalam penelitian ini adalah data yang publikasikan  oleh BPS sebagai UKM dan sesuai dengan (UU. No. 20 tahun 2008). Sampel dalam penelitian ini adalah sebesar 970X10%= 97 responden, dibulatkan menjadi 100 responden, maka dalam penelitian ini sampelnya 100 responden UKM konveksi di Kabupaten Sragen dengan menggunakan  convenience   sampling. Analisis penelitian ini menggunakan analisis jalur. Hasil penelitian ini menunjukkan  Orientasi kewirausahaan dan orientasi pembelajaran berpengaruh positif dan signifikan terhadap orientasi pelanggan. Orientasi kewirausahaan dan orientasi pembelajaran berpengaruh positif dan signifikan terhadap orientasi inovasi. Orientasi kewirausahaan, orientasi pelanggan dan orientasi inovasi berpengaruh positif dan signifikan terhadap kinerja pemasaran. Dan orientasi pembelajaran berpengaruh negatif dan tidak signifikan terhadap kinerja pemasaran. Variabel yang paling dominan adalah variabel orientasi pembelajaran melalui orientasi inovasi terhadap kinerja pemasaran

    INTELLIGENT SYSTEMS FOR INDUSTRY USING REINFORCEMENT LEARNING TECHNIQUE

    Get PDF
    The rise of Intelligent Systems has happened gradually, then suddenly. They are gradual because we are aware that this field of computing has come a long way along with the history of computers. Yet, the sudden astonishing changes that affect mankind seem to take everyone in surprise. Their occurrence is reshaping the real world and our interaction with our digital life is changing in profound ways. Can computers think? We don’t have evidence on that, whatever the answer to that question is. But what we know is that computers do learn. Indeed, the whole process of computer evolution revolves around machines that are able to follow instructions and practice and eventually get better at what they are initially produced to accomplish. Consequently, the questions that we try to answer are related to the types of learning that intelligent programs use with special regards to one of the most researched methods of Machine Learning – Reinforcement Learning. On the other hand, it is crucial to apply the intelligent self-learning machines in industry, environment, enterprise, medicine and all the other sectors where we need to see the substantial changes that correspond with the era of machines that can learn. The intersection point in this research is the application of intelligent programs in industry using a very specific learning technique – Reinforcement Learning

    Deep multiagent reinforcement learning: challenges and directions

    Get PDF
    This paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players' joint actions and (b) the computational complexity increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours, such as communication and coordination, to help agents achieve better performance in multiagent settings. We suggest that, for multiagent RL to be successful, future research should address these challenges with an interdisciplinary approach to open up new possibilities in multiagent RL.Algorithms and the Foundations of Software technolog

    AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

    No full text
    This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.Reinforcement learning, multi-agent, reward shaping
    corecore