789 research outputs found

    Residual Reactive Navigation: Combining Classical and Learned Navigation Strategies For Deployment in Unknown Environments

    Full text link
    In this work we focus on improving the efficiency and generalisation of learned navigation strategies when transferred from its training environment to previously unseen ones. We present an extension of the residual reinforcement learning framework from the robotic manipulation literature and adapt it to the vast and unstructured environments that mobile robots can operate in. The concept is based on learning a residual control effect to add to a typical sub-optimal classical controller in order to close the performance gap, whilst guiding the exploration process during training for improved data efficiency. We exploit this tight coupling and propose a novel deployment strategy, switching Residual Reactive Navigation (sRRN), which yields efficient trajectories whilst probabilistically switching to a classical controller in cases of high policy uncertainty. Our approach achieves improved performance over end-to-end alternatives and can be incorporated as part of a complete navigation stack for cluttered indoor navigation tasks in the real world. The code and training environment for this project is made publicly available at https://sites.google.com/view/srrn/home.Comment: Accepted as a conference paper at ICRA2020. Project site available at https://sites.google.com/view/srrn/hom

    Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

    Full text link
    Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements. Many practical tasks require manipulation of multiple objects, and the complexity of such tasks increases with the number of objects. Learning from a curriculum of increasingly complex tasks appears to be a natural solution, but unfortunately, does not work for many scenarios. We hypothesize that the inability of the state-of-the-art algorithms to effectively utilize a task curriculum stems from the absence of inductive biases for transferring knowledge from simpler to complex tasks. We show that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects. We demonstrate the utility of our framework on a simulated block stacking task. Starting from scratch, our agent learns to stack six blocks into a tower. Despite using step-wise sparse rewards, our method is orders of magnitude more data-efficient and outperforms the existing state-of-the-art method that utilizes human demonstrations. Furthermore, the learned policy exhibits zero-shot generalization, successfully stacking blocks into taller towers and previously unseen configurations such as pyramids, without any further training.Comment: 10 pages, 4 figures and 1 table in main article, 3 figures and 3 tables in appendix. Supplementary website and videos at https://richardrl.github.io/relational-rl

    The State of Lifelong Learning in Service Robots: Current Bottlenecks in Object Perception and Manipulation

    Get PDF
    Service robots are appearing more and more in our daily life. The development of service robots combines multiple fields of research, from object perception to object manipulation. The state-of-the-art continues to improve to make a proper coupling between object perception and manipulation. This coupling is necessary for service robots not only to perform various tasks in a reasonable amount of time but also to continually adapt to new environments and safely interact with non-expert human users. Nowadays, robots are able to recognize various objects, and quickly plan a collision-free trajectory to grasp a target object in predefined settings. Besides, in most of the cases, there is a reliance on large amounts of training data. Therefore, the knowledge of such robots is fixed after the training phase, and any changes in the environment require complicated, time-consuming, and expensive robot re-programming by human experts. Therefore, these approaches are still too rigid for real-life applications in unstructured environments, where a significant portion of the environment is unknown and cannot be directly sensed or controlled. In such environments, no matter how extensive the training data used for batch learning, a robot will always face new objects. Therefore, apart from batch learning, the robot should be able to continually learn about new object categories and grasp affordances from very few training examples on-site. Moreover, apart from robot self-learning, non-expert users could interactively guide the process of experience acquisition by teaching new concepts, or by correcting insufficient or erroneous concepts. In this way, the robot will constantly learn how to help humans in everyday tasks by gaining more and more experiences without the need for re-programming

    ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ดํ•ด ๊ธฐ๋ฐ˜ ์ธ๊ฐ„ ๋กœ๋ด‡ ํ˜‘์—…

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ์ด๋ฒ”ํฌ.Human-robot cooperation is unavoidable in various applications ranging from manufacturing to field robotics owing to the advantages of adaptability and high flexibility. Especially, complex task planning in large, unconstructed, and uncertain environments can employ the complementary capabilities of human and diverse robots. For a team to be effectives, knowledge regarding team goals and current situation needs to be effectively shared as they affect decision making. In this respect, semantic scene understanding in natural language is one of the most fundamental components for information sharing between humans and heterogeneous robots, as robots can perceive the surrounding environment in a form that both humans and other robots can understand. Moreover, natural-language-based scene understanding can reduce network congestion and improve the reliability of acquired data. Especially, in field robotics, transmission of raw sensor data increases network bandwidth and decreases quality of service. We can resolve this problem by transmitting information in the form of natural language that has encoded semantic representations of environments. In this dissertation, I introduce a human and heterogeneous robot cooperation scheme based on semantic scene understanding. I generate sentences and scene graphs, which is a natural language grounded graph over the detected objects and their relationships, with the graph map generated using a robot mapping algorithm. Subsequently, a framework that can utilize the results for cooperative mission planning of humans and robots is proposed. Experiments were performed to verify the effectiveness of the proposed methods. This dissertation comprises two parts: graph-based scene understanding and scene understanding based on the cooperation between human and heterogeneous robots. For the former, I introduce a novel natural language processing method using a semantic graph map. Although semantic graph maps have been widely applied to study the perceptual aspects of the environment, such maps do not find extensive application in natural language processing tasks. Several studies have been conducted on the understanding of workspace images in the field of computer vision; in these studies, the sentences were automatically generated, and therefore, multiple scenes have not yet been utilized for sentence generation. A graph-based convolutional neural network, which comprises spectral graph convolution and graph coarsening, and a recurrent neural network are employed to generate sentences attention over graphs. The proposed method outperforms the conventional methods on a publicly available dataset for single scenes and can be utilized for sequential scenes. Recently, deep learning has demonstrated impressive developments in scene understanding using natural language. However, it has not been extensively applied to high-level processes such as causal reasoning, analogical reasoning, or planning. The symbolic approach that calculates the sequence of appropriate actions by combining the available skills of agents outperforms in reasoning and planning; however, it does not entirely consider semantic knowledge acquisition for human-robot information sharing. An architecture that combines deep learning techniques and symbolic planner for human and heterogeneous robots to achieve a shared goal based on semantic scene understanding is proposed for scene understanding based on human-robot cooperation. In this study, graph-based perception is used for scene understanding. A planning domain definition language (PDDL) planner and JENA-TDB are utilized for mission planning and data acquisition storage, respectively. The effectiveness of the proposed method is verified in two situations: a mission failure, in which the dynamic environment changes, and object detection in a large and unseen environment.์ธ๊ฐ„๊ณผ ์ด์ข… ๋กœ๋ด‡ ๊ฐ„์˜ ํ˜‘์—…์€ ๋†’์€ ์œ ์—ฐ์„ฑ๊ณผ ์ ์‘๋ ฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ์ œ์กฐ์—…์—์„œ ํ•„๋“œ ๋กœ๋ณดํ‹ฑ์Šค๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ•„์—ฐ์ ์ด๋‹ค. ํŠนํžˆ, ์„œ๋กœ ๋‹ค๋ฅธ ๋Šฅ๋ ฅ์„ ์ง€๋‹Œ ๋กœ๋ด‡๋“ค๊ณผ ์ธ๊ฐ„์œผ๋กœ ๊ตฌ์„ฑ๋œ ํ•˜๋‚˜์˜ ํŒ€์€ ๋„“๊ณ  ์ •ํ˜•ํ™”๋˜์ง€ ์•Š์€ ๊ณต๊ฐ„์—์„œ ์„œ๋กœ์˜ ๋Šฅ๋ ฅ์„ ๋ณด์™„ํ•˜๋ฉฐ ๋ณต์žกํ•œ ์ž„๋ฌด ์ˆ˜ํ–‰์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค๋Š” ์ ์—์„œ ํฐ ์žฅ์ ์„ ๊ฐ–๋Š”๋‹ค. ํšจ์œจ์ ์ธ ํ•œ ํŒ€์ด ๋˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ํŒ€์˜ ๊ณตํ†ต๋œ ๋ชฉํ‘œ ๋ฐ ๊ฐ ํŒ€์›์˜ ํ˜„์žฌ ์ƒํ™ฉ์— ๊ด€ํ•œ ์ •๋ณด๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๋ฉฐ ํ•จ๊ป˜ ์˜์‚ฌ ๊ฒฐ์ •์„ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์—์„œ, ์ž์—ฐ์–ด๋ฅผ ํ†ตํ•œ ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ดํ•ด๋Š” ์ธ๊ฐ„๊ณผ ์„œ๋กœ ๋‹ค๋ฅธ ๋กœ๋ด‡๋“ค์ด ๋ชจ๋‘ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ํ™˜๊ฒฝ์„ ์ธ์ง€ํ•œ๋‹ค๋Š” ์ ์—์„œ ๊ฐ€์žฅ ํ•„์ˆ˜์ ์ธ ์š”์†Œ์ด๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์ž์—ฐ์–ด ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ ์ดํ•ด๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ ํ˜ผ์žก์„ ํ”ผํ•จ์œผ๋กœ์จ ํš๋“ํ•œ ์ •๋ณด์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ, ๋Œ€๋Ÿ‰์˜ ์„ผ์„œ ๋ฐ์ดํ„ฐ ์ „์†ก์— ์˜ํ•ด ๋„คํŠธ์›Œํฌ ๋Œ€์—ญํญ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ํ†ต์‹  QoS (Quality of Service) ์‹ ๋ขฐ๋„๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋นˆ๋ฒˆํžˆ ๋ฐœ์ƒํ•˜๋Š” ํ•„๋“œ ๋กœ๋ณดํ‹ฑ์Šค ์˜์—ญ์—์„œ๋Š” ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ •๋ณด์ธ ์ž์—ฐ์–ด๋ฅผ ์ „์†กํ•จ์œผ๋กœ์จ ํ†ต์‹  ๋Œ€์—ญํญ์„ ๊ฐ์†Œ์‹œํ‚ค๊ณ  ํ†ต์‹  QoS ์‹ ๋ขฐ๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ํ™˜๊ฒฝ์˜ ์˜๋ฏธ๋ก ์  ์ดํ•ด ๊ธฐ๋ฐ˜ ์ธ๊ฐ„ ๋กœ๋ด‡ ํ˜‘๋™ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•œ๋‹ค. ๋จผ์ €, ๋กœ๋ด‡์˜ ์ง€๋„ ์ž‘์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ํš๋“ํ•œ ๊ทธ๋ž˜ํ”„ ์ง€๋„๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž์—ฐ์–ด ๋ฌธ์žฅ๊ณผ ๊ฒ€์ถœํ•œ ๊ฐ์ฒด ๋ฐ ๊ฐ ๊ฐ์ฒด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ž์—ฐ์–ด ๋‹จ์–ด๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๊ฐ„๊ณผ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡๋“ค์ด ํ•จ๊ป˜ ํ˜‘์—…ํ•˜์—ฌ ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํฌ๊ฒŒ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•œ ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ดํ•ด์™€ ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ดํ•ด๋ฅผ ํ†ตํ•œ ์ธ๊ฐ„๊ณผ ์ด์ข… ๋กœ๋ด‡ ๊ฐ„์˜ ํ˜‘์—… ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ๋จผ์ €, ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•œ ์˜๋ฏธ๋ก ์  ํ™˜๊ฒฝ ์ดํ•ด ๋ถ€๋ถ„์—์„œ๋Š” ์˜๋ฏธ๋ก ์  ๊ทธ๋ž˜ํ”„ ์ง€๋„๋ฅผ ์ด์šฉํ•œ ์ƒˆ๋กœ์šด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•œ๋‹ค. ์˜๋ฏธ๋ก ์  ๊ทธ๋ž˜ํ”„ ์ง€๋„ ์ž‘์„ฑ ๋ฐฉ๋ฒ•์€ ๋กœ๋ด‡์˜ ํ™˜๊ฒฝ ์ธ์ง€ ์ธก๋ฉด์—์„œ ๋งŽ์ด ์—ฐ๊ตฌ๋˜์—ˆ์ง€๋งŒ ์ด๋ฅผ ์ด์šฉํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์€ ๊ฑฐ์˜ ์—ฐ๊ตฌ๋˜์ง€ ์•Š์•˜๋‹ค. ๋ฐ˜๋ฉด ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ด์šฉํ•œ ํ™˜๊ฒฝ ์ดํ•ด ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ์ด๋ฃจ์–ด์กŒ์ง€๋งŒ, ์—ฐ์†์ ์ธ ์žฅ๋ฉด๋“ค์€ ๋‹ค๋ฃจ๋Š”๋ฐ๋Š” ํ•œ๊ณ„์ ์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ทธ๋ž˜ํ”„ ์ŠคํŽ™ํŠธ๋Ÿผ ์ด๋ก ์— ๊ธฐ๋ฐ˜ํ•œ ๊ทธ๋ž˜ํ”„ ์ปจ๋ณผ๋ฃจ์…˜๊ณผ ๊ทธ๋ž˜ํ”„ ์ถ•์†Œ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋œ ๊ทธ๋ž˜ํ”„ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง ๋ฐ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ๊ทธ๋ž˜ํ”„๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ํ•œ ์žฅ๋ฉด์— ๋Œ€ํ•ด ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ ์—ฐ์†๋œ ์žฅ๋ฉด๋“ค์— ๋Œ€ํ•ด์„œ๋„ ์„ฑ๊ณต์ ์œผ๋กœ ์ž์—ฐ์–ด ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹์€ ์ž์—ฐ์–ด ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ ์ธ์ง€์— ์žˆ์–ด ๊ธ‰์†๋„๋กœ ํฐ ๋ฐœ์ „์„ ์ด๋ฃจ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ธ๊ณผ ์ถ”๋ก , ์œ ์ถ”์  ์ถ”๋ก , ์ž„๋ฌด ๊ณ„ํš๊ณผ ๊ฐ™์€ ๋†’์€ ์ˆ˜์ค€์˜ ํ”„๋กœ์„ธ์Šค์—๋Š” ์ ์šฉ์ด ํž˜๋“ค๋‹ค. ๋ฐ˜๋ฉด ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ฐ ์—์ด์ „ํŠธ์˜ ๋Šฅ๋ ฅ์— ๋งž๊ฒŒ ํ–‰์œ„๋“ค์˜ ์ˆœ์„œ๋ฅผ ๊ณ„์‚ฐํ•ด์ฃผ๋Š” ์ƒ์ง•์  ์ ‘๊ทผ๋ฒ•(symbolic approach)์€ ์ถ”๋ก ๊ณผ ์ž„๋ฌด ๊ณ„ํš์— ์žˆ์–ด ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ ์ธ๊ฐ„๊ณผ ๋กœ๋ด‡๋“ค ์‚ฌ์ด์˜ ์˜๋ฏธ๋ก ์  ์ •๋ณด ๊ณต์œ  ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ๋Š” ๊ฑฐ์˜ ๋‹ค๋ฃจ์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ, ์ธ๊ฐ„๊ณผ ์ด์ข… ๋กœ๋ด‡ ๊ฐ„์˜ ํ˜‘์—… ๋ฐฉ๋ฒ• ๋ถ€๋ถ„์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๋“ค๊ณผ ์ƒ์ง•์  ํ”Œ๋ž˜๋„ˆ(symbolic planner)๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ์˜๋ฏธ๋ก ์  ์ดํ•ด๋ฅผ ํ†ตํ•œ ์ธ๊ฐ„ ๋ฐ ์ด์ข… ๋กœ๋ด‡ ๊ฐ„์˜ ํ˜‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ์˜๋ฏธ๋ก ์  ์ฃผ๋ณ€ ํ™˜๊ฒฝ ์ดํ•ด๋ฅผ ์œ„ํ•ด ์ด์ „ ๋ถ€๋ถ„์—์„œ ์ œ์•ˆํ•œ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์ž์—ฐ์–ด ๋ฌธ์žฅ ์ƒ์„ฑ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. PDDL ํ”Œ๋ž˜๋„ˆ์™€ JENA-TDB๋Š” ๊ฐ๊ฐ ์ž„๋ฌด ๊ณ„ํš ๋ฐ ์ •๋ณด ํš๋“ ์ €์žฅ์†Œ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์˜ ํšจ์šฉ์„ฑ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ๋‘ ๊ฐ€์ง€ ์ƒํ™ฉ์— ๋Œ€ํ•ด์„œ ๊ฒ€์ฆํ•œ๋‹ค. ํ•˜๋‚˜๋Š” ๋™์  ํ™˜๊ฒฝ์—์„œ ์ž„๋ฌด ์‹คํŒจ ์ƒํ™ฉ์ด๋ฉฐ ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๋„“์€ ๊ณต๊ฐ„์—์„œ ๊ฐ์ฒด๋ฅผ ์ฐพ๋Š” ์ƒํ™ฉ์ด๋‹ค.1 Introduction 1 1.1 Background and Motivation 1 1.2 Literature Review 5 1.2.1 Natural Language-Based Human-Robot Cooperation 5 1.2.2 Artificial Intelligence Planning 5 1.3 The Problem Statement 10 1.4 Contributions 11 1.5 Dissertation Outline 12 2 Natural Language-Based Scene Graph Generation 14 2.1 Introduction 14 2.2 Related Work 16 2.3 Scene Graph Generation 18 2.3.1 Graph Construction 19 2.3.2 Graph Inference 19 2.4 Experiments 22 2.5 Summary 25 3 Language Description with 3D Semantic Graph 26 3.1 Introduction 26 3.2 Related Work 26 3.3 Natural Language Description 29 3.3.1 Preprocess 29 3.3.2 Graph Feature Extraction 33 3.3.3 Natural Language Description with Graph Features 34 3.4 Experiments 35 3.5 Summary 42 4 Natural Question with Semantic Graph 43 4.1 Introduction 43 4.2 Related Work 45 4.3 Natural Question Generation 47 4.3.1 Preprocess 49 4.3.2 Graph Feature Extraction 50 4.3.3 Natural Question with Graph Features 51 4.4 Experiments 52 4.5 Summary 58 5 PDDL Planning with Natural Language 59 5.1 Introduction 59 5.2 Related Work 60 5.3 PDDL Planning with Incomplete World Knowledge 61 5.3.1 Natural Language Process for PDDL Planning 63 5.3.2 PDDL Planning System 64 5.4 Experiments 65 5.5 Summary 69 6 PDDL Planning with Natural Language-Based Scene Understanding 70 6.1 Introduction 70 6.2 Related Work 74 6.3 A Framework for Heterogeneous Multi-Agent Cooperation 77 6.3.1 Natural Language-Based Cognition 78 6.3.2 Knowledge Engine 80 6.3.3 PDDL Planning Agent 81 6.4 Experiments 82 6.4.1 Experiment Setting 82 6.4.2 Scenario 84 6.4.3 Results 87 6.5 Summary 91 7 Conclusion 92Docto

    Deep learning based approaches for imitation learning.

    Get PDF
    Imitation learning refers to an agent's ability to mimic a desired behaviour by learning from observations. The field is rapidly gaining attention due to recent advances in computational and communication capabilities as well as rising demand for intelligent applications. The goal of imitation learning is to describe the desired behaviour by providing demonstrations rather than instructions. This enables agents to learn complex behaviours with general learning methods that require minimal task specific information. However, imitation learning faces many challenges. The objective of this thesis is to advance the state of the art in imitation learning by adopting deep learning methods to address two major challenges of learning from demonstrations. Firstly, representing the demonstrations in a manner that is adequate for learning. We propose novel Convolutional Neural Networks (CNN) based methods to automatically extract feature representations from raw visual demonstrations and learn to replicate the demonstrated behaviour. This alleviates the need for task specific feature extraction and provides a general learning process that is adequate for multiple problems. The second challenge is generalizing a policy over unseen situations in the training demonstrations. This is a common problem because demonstrations typically show the best way to perform a task and don't offer any information about recovering from suboptimal actions. Several methods are investigated to improve the agent's generalization ability based on its initial performance. Our contributions in this area are three fold. Firstly, we propose an active data aggregation method that queries the demonstrator in situations of low confidence. Secondly, we investigate combining learning from demonstrations and reinforcement learning. A deep reward shaping method is proposed that learns a potential reward function from demonstrations. Finally, memory architectures in deep neural networks are investigated to provide context to the agent when taking actions. Using recurrent neural networks addresses the dependency between the state-action sequences taken by the agent. The experiments are conducted in simulated environments on 2D and 3D navigation tasks that are learned from raw visual data, as well as a 2D soccer simulator. The proposed methods are compared to state of the art deep reinforcement learning methods. The results show that deep learning architectures can learn suitable representations from raw visual data and effectively map them to atomic actions. The proposed methods for addressing generalization show improvements over using supervised learning and reinforcement learning alone. The results are thoroughly analysed to identify the benefits of each approach and situations in which it is most suitable

    Learning Dynamic Priority Scheduling Policies with Graph Attention Networks

    Get PDF
    The aim of this thesis is to develop novel graph attention network-based models to automatically learn scheduling policies for effectively solving resource optimization problems, covering both deterministic and stochastic environments. The policy learning methods utilize both imitation learning, when expert demonstrations are accessible at low cost, and reinforcement learning, when otherwise reward engineering is feasible. By parameterizing the learner with graph attention networks, the framework is computationally efficient and results in scalable resource optimization schedulers that adapt to various problem structures. This thesis addresses the problem of multi-robot task allocation (MRTA) under temporospatial constraints. Initially, robots with deterministic and homogeneous task performance are considered with the development of the RoboGNN scheduler. Then, I develop ScheduleNet, a novel heterogeneous graph attention network model, to efficiently reason about coordinating teams of heterogeneous robots. Next, I address problems under the more challenging stochastic setting in two parts. Part 1) Scheduling with stochastic and dynamic task completion times. The MRTA problem is extended by introducing human coworkers with dynamic learning curves and stochastic task execution. HybridNet, a hybrid network structure, has been developed that utilizes a heterogeneous graph-based encoder and a recurrent schedule propagator, to carry out fast schedule generation in multi-round settings. Part 2) Scheduling with stochastic and dynamic task arrival and completion times. With an application in failure-predictive plane maintenance, I develop a heterogeneous graph-based policy optimization (HetGPO) approach to enable learning robust scheduling policies in highly stochastic environments. Through extensive experiments, the proposed framework has been shown to outperform prior state-of-the-art algorithms in different applications. My research contributes several key innovations regarding designing graph-based learning algorithms in operations research.Ph.D
    • โ€ฆ
    corecore