8 research outputs found

    Deep reinforcement learning-based pitch attitude control of a beaver-like underwater robot

    Get PDF
    The foot paddling of an underwater robot causes continuous changes of the water flow field, which results in the unbalanced hydrodynamic force to change the robot's posture continuously. As the water environment and robot swimming are nonlinear and strongly coupled systems, it is difficult to establish an accurate model. This paper presents an underwater robot, which adopts the synchronous and alternate swimming trajectory of a beaver. Its pitch stability control model is established by using deep reinforcement learning algorithm and its self-learning control system is constructed for stable control of pitch attitude. Experiments are conducted to show that the pitch attitude of the beaver-like underwater robot can be stabilized while maintaining a certain swimming speed. The control method does not need to establish a complex and high-order model of webbed paddling hydrodynamics, which provides a new idea for stable swimming control of underwater robots. This work aims to find an excellent control method for underwater bionic robots. The ocean has the richest natural resources and the most diverse species on Earth. The underwater environment is complex and variable, imposing higher demands on the performance of underwater robots. Increasingly, new concept marine equipment is being researched for scientific exploration, and among these, underwater robots designed based on bionic principles are a growing trend. Currently, most underwater robots still use propellers as their propulsion system. Propellers have advantages such as simple control, high mechanical efficiency, and powerful propulsion, but they also have drawbacks including severe water flow disturbance during operation, high noise, poor concealment, and limited adaptability in complex water environments. Finding a propulsion system with better overall performance is a crucial way to enhance the motion capabilities of underwater robots. Underwater robots often have complex structures, and there are numerous factors influencing their movement in the underwater environment, making fluid dynamics modeling and optimization challenging. Reinforcement learning, as an optimization algorithm, can circumvent the aforementioned difficulties

    Self-Labeling Online Learning for Mobile Robot Grasping

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 자연과학대학 협동과정 뇌과학전공, 2022.2. 장병탁.동적인 환경에서 물체 파지를 정확하고 견고하게 하는 것은 모바일 조작 로봇이 성공적으로 과업을 수행하는데 필수적이다. 과거 암 로봇의 조작 연구에선 파지 인식을 위해 촉각 센서나 시각 센서를 사용하여 이를 해결하고자 했다. 하지만 이동형 로봇은 변화하는 환경에서 움직임으로 인해 노이즈가 발생함을 고려해야 한다. 최근 파지 인식 연구는 학습 기반 알고리즘에 의존하고 있다. 학습 기반 방법은 데이터를 수집하고 라벨을 입력하는데 많은 시간과 노력이 필요한 제한이 있다. 따라서 본 논문은 로봇의 파지인식학습을 위해, 스스로 라벨링을 수행하며 온라인 학습하는 과정을 자동화하는 종단간(end-to-end) 방법을 제시한다. 셀프 라벨링은 로봇이 물체가 파지 후 사라졌는지 여부를 카메라를 통한 물체 인식으로 확인하여 수행한다. 파지 인식은 멀티모달 파지 인식 네트워크를 통해 학습되며, 이때 입력 데이터는 카메라와 그리퍼의 여러 센서를 통해 얻은 데이터를 활용한다. 제안한 방법을 검증하기 위해 실내 거실 환경에서 정리정돈 과업을 수행하는 실험을 설계하였다. HSR 로봇을 활용해 11개의 물체를 정리정돈하는 두가지 비교실험을 진행하였고, 파지 인식 네트워크를 사용한 실험이 사용하지 않은 실험대비 파지 실패가 3회, 5회 발생했을 때 과업 수행 시간에서 각각 10.7%와 14.7%의 향상을 보여 제안한 방법의 효율성을 입증하였다.In this paper, we proposed a new grasp perception method for mobile manipulation robot that utilizes both self-labeling and online learning. Self-labeling is implemented by using object detection as supervision, and online learning was achieved by training the model with a randomly sampled batch from a queue-based memory. For robust perception, the GPN model is trained by processing four types of sensory data, and shows high accuracy in performance with various objects. To demonstrate our self-labeling online learning framework, we designed a pick-and-place experiment in a real-world environment with everyday objects. We verified the effectiveness of the GPN by a comparative experiment that measured the task performance by comparing time within two demos: one using the GPN-trained model, and the other using a simple logical method. As a result, it was confirmed that using the GPN does contribute in saving time for picking and placing the objects, especially if more failures occur, or the time spent in delivering the objects increases.제 1 장 서 론 1 제 1 절 파지 인식 연구의 필요성 및 연구 동향 1 제 2 절 데이터 라벨링의 자동화 필요성 및 방안 제시 3 제 3 절 연구의 내용 4 제 2 장 배경 연구 6 제 1 절 물체 파지 인식 6 제 2 절 온라인 학습 6 제 3 절 자기지도 학습과 셀프 라벨링 7 제 3 장 셀프 라벨링을 통한 파지 인식 온라인 학습 8 제 1 절 로봇을 이용한 셀프 라벨링 8 제 2 절 메모리 기반 온라인 학습 11 제 3 절 파지 인식 네트워크 12 제 4 장 실험 설정 13 제 1 절 로봇 플랫폼 13 제 2 절 파지 작업을 위한 물체 목록 15 제 3 절 RGB-D 카메라 기반 거리 계산 18 제 4 절 실험 방법 20 제 5 장 실험 결과 22 제 1 절 온라인 학습을 통한 파지 인식 네트워크 학습 22 제 2 절 파지 인식 네트워크 사용에 따른 성능 비교 25 제 6 장 고찰 및 결론 28 제 1 절 연구의 정리 28 제 2 절 연구의 고찰 29 참고문헌 31 Abstract 38석

    Reinforcement learning based path planning of multiple agents of SwarmItFIX robot for fixturing operation in sheetmetal milling process

    Get PDF
    SwarmItFIX (self-reconfigurable intelligent swarm fixtures) is a multi-agent setup mainly used as a robotic fixture for large Sheet metal machining operations. A Constraint Satisfaction Problem (CSP) based planning model is utilized currently for computing the locomotion sequence of multiple agents of the SwarmItFIX. But the SwarmItFIX faces several challenges with the current planner as it fails on several occasions. Moreover, the current planner computes only the goal positions of the base agent, not the path. To overcome these issues, a novel hierarchical planner is proposed, which employs Monte Carlo and SARSA TD based model-free Reinforcement Learning (RL) algorithms for the computation of locomotion sequences of head and base agents, respectively. These methods hold two distinct features when compared with the existing methods (i) the transition model is not required for obtaining the locomotion sequence of the computational agent, and (ii) the state-space of the computational agent become scalable. The obtained results show that the proposed planner is capable of delivering optimal makespan for effective fixturing during the sheet metal milling process

    Vahvistusoppimisen metodit ja sovellutukset robotiikassa

    Get PDF
    Vahvistusoppiminen on nopeasti kehittyvä koneoppimisen ratkaisutekniikka, jonka suosio on kasvanut viime vuosina. Tekniikan suosio perustuu pääasiassa siihen, että se mahdollistaa agentin itsenäisen oppimisen ja ongelmanratkaisun sekä inhimillisen yritys-erehdys oppimiskaavan kaltaisen prosessin. Vahvistusoppimisen suurena vahvuutena on, että sen avul la pystytään ratkaisemaan erityisesti robotiikan alalla sellaisia ongelmia, joihin muut koneoppi misen metodit eivät välttämättä pysty. Robotiikkaan vahvistusoppimisen metodeista soveltuvat parhaiten politiikkahakumetodit, arvofunktiometodit ja toimija-kriitikko -metodit. Tämän kandidaatintyön pääasiallinen tavoite on kirjallisuuskatsauksen avulla luoda silmäys vahvistusoppimisen metodeihin ja joihinkin sen sovellutuksiin. Työssä käydään läpi vahvistus oppimisen pääpiirteet ja taustoitetaan lyhyesti sen lähtökohtia osana koneoppimista. Vahvistusoppimisessa keskeisin huomio on siinä, miten ihmisten ja eläinkunnan hallitsemat oppimismekanismit ja ongelmanratkaisutaidot voidaan siirtää robotin hallittavaksi. Perinteisesti näitä taitoja on pidetty inhimillisinä voimavaroina, joten niiden tuominen myös osaksi tekniikan maailmaa ja järjestelmiä tuo samalla täysin uusia mahdollisuuksia ja haasteita mukanaan. Vahvistusoppimisen lukuisista sovellutuksista esittelen työssäni kaksi, mobiilirobotiikan ja työkonesovellukset. Olen valinnut käsitellä nimenomaan näitä kahta, koska niistä on saatavilla eniten tutkimustietoa ja esimerkkejä ja ne ovat siten ainakin toistaiseksi käytetyimpiä alan sovel lutuksista. Mobiilirobotiikassa vahvistusoppimista käytetään erityisesti mobiilirobotin navigointiin liittyvien ongelmien ratkaisuun ja työkonesovellutuksissa työkoneen turvallisuuden ja tuottavuu den edistämiseen. Esimerkkinä jälkimmäisestä on muun muassa robottikouran tarttuminen ja otteen kehittäminen. Työssäni olen joutunut tekemään rajauksia sen suhteen, mitä osa-alueita otan huomioon. Vahvistusoppiminen on aihealueena niin laaja ja monelle eri alalle ulottuva tekniikka, että sen täydellinen käsittely kandidaatintyön puitteissa ei ole mahdollista. Olen kuitenkin pyrkinyt perus telemaan ratkaisuni alan muiden tutkimusten avulla ja käsittelemään työssäni vahvistusoppimis ta mahdollisimman kattavasti

    Learning Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding

    Full text link
    Multi-agent pathfinding (MAPF) is a critical field in many large-scale robotic applications, often being the fundamental step in multi-agent systems. The increasing complexity of MAPF in complex and crowded environments, however, critically diminishes the effectiveness of existing solutions. In contrast to other studies that have either presented a general overview of the recent advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL) within multi-agent system settings independently, our work presented in this review paper focuses on highlighting the integration of DRL-based approaches in MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation metrics and providing comprehensive clarification on these metrics. Finally, our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding to address current challenges in MAPF. Our objective is to assist readers in gaining insight into the current research direction, providing unified metrics for comparing different MAPF algorithms and expanding their knowledge of model-based DRL to address the existing challenges in MAPF.Comment: 36 pages, 10 figures, published in Artif Intell Rev 57, 41 (2024

    Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning

    No full text
    corecore