Search CORE

1,403 research outputs found

의미론적 환경 이해 기반 인간 로봇 협업

Author: 문지윤
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 이범희.Human-robot cooperation is unavoidable in various applications ranging from manufacturing to field robotics owing to the advantages of adaptability and high flexibility. Especially, complex task planning in large, unconstructed, and uncertain environments can employ the complementary capabilities of human and diverse robots. For a team to be effectives, knowledge regarding team goals and current situation needs to be effectively shared as they affect decision making. In this respect, semantic scene understanding in natural language is one of the most fundamental components for information sharing between humans and heterogeneous robots, as robots can perceive the surrounding environment in a form that both humans and other robots can understand. Moreover, natural-language-based scene understanding can reduce network congestion and improve the reliability of acquired data. Especially, in field robotics, transmission of raw sensor data increases network bandwidth and decreases quality of service. We can resolve this problem by transmitting information in the form of natural language that has encoded semantic representations of environments. In this dissertation, I introduce a human and heterogeneous robot cooperation scheme based on semantic scene understanding. I generate sentences and scene graphs, which is a natural language grounded graph over the detected objects and their relationships, with the graph map generated using a robot mapping algorithm. Subsequently, a framework that can utilize the results for cooperative mission planning of humans and robots is proposed. Experiments were performed to verify the effectiveness of the proposed methods. This dissertation comprises two parts: graph-based scene understanding and scene understanding based on the cooperation between human and heterogeneous robots. For the former, I introduce a novel natural language processing method using a semantic graph map. Although semantic graph maps have been widely applied to study the perceptual aspects of the environment, such maps do not find extensive application in natural language processing tasks. Several studies have been conducted on the understanding of workspace images in the field of computer vision; in these studies, the sentences were automatically generated, and therefore, multiple scenes have not yet been utilized for sentence generation. A graph-based convolutional neural network, which comprises spectral graph convolution and graph coarsening, and a recurrent neural network are employed to generate sentences attention over graphs. The proposed method outperforms the conventional methods on a publicly available dataset for single scenes and can be utilized for sequential scenes. Recently, deep learning has demonstrated impressive developments in scene understanding using natural language. However, it has not been extensively applied to high-level processes such as causal reasoning, analogical reasoning, or planning. The symbolic approach that calculates the sequence of appropriate actions by combining the available skills of agents outperforms in reasoning and planning; however, it does not entirely consider semantic knowledge acquisition for human-robot information sharing. An architecture that combines deep learning techniques and symbolic planner for human and heterogeneous robots to achieve a shared goal based on semantic scene understanding is proposed for scene understanding based on human-robot cooperation. In this study, graph-based perception is used for scene understanding. A planning domain definition language (PDDL) planner and JENA-TDB are utilized for mission planning and data acquisition storage, respectively. The effectiveness of the proposed method is verified in two situations: a mission failure, in which the dynamic environment changes, and object detection in a large and unseen environment.인간과 이종 로봇 간의 협업은 높은 유연성과 적응력을 보일 수 있다는 점에서 제조업에서 필드 로보틱스까지 다양한 분야에서 필연적이다. 특히, 서로 다른 능력을 지닌 로봇들과 인간으로 구성된 하나의 팀은 넓고 정형화되지 않은 공간에서 서로의 능력을 보완하며 복잡한 임무 수행을 가능하게 한다는 점에서 큰 장점을 갖는다. 효율적인 한 팀이 되기 위해서는, 팀의 공통된 목표 및 각 팀원의 현재 상황에 관한 정보를 실시간으로 공유할 수 있어야 하며 함께 의사 결정을 할 수 있어야 한다. 이러한 관점에서, 자연어를 통한 의미론적 환경 이해는 인간과 서로 다른 로봇들이 모두 이해할 수 있는 형태로 환경을 인지한다는 점에서 가장 필수적인 요소이다. 또한, 우리는 자연어 기반 환경 이해를 통해 네트워크 혼잡을 피함으로써 획득한 정보의 신뢰성을 높일 수 있다. 특히, 대량의 센서 데이터 전송에 의해 네트워크 대역폭이 증가하고 통신 QoS (Quality of Service) 신뢰도가 감소하는 문제가 빈번히 발생하는 필드 로보틱스 영역에서는 의미론적 환경 정보인 자연어를 전송함으로써 통신 대역폭을 감소시키고 통신 QoS 신뢰도를 증가시킬 수 있다. 본 학위 논문에서는 환경의 의미론적 이해 기반 인간 로봇 협동 방법에 대해 소개한다. 먼저, 로봇의 지도 작성 알고리즘을 통해 획득한 그래프 지도를 이용하여 자연어 문장과 검출한 객체 및 각 객체 간의 관계를 자연어 단어로 표현하는 그래프를 생성한다. 그리고 자연어 처리 결과를 이용하여 인간과 다양한 로봇들이 함께 협업하여 임무를 수행할 수 있도록 하는 프레임워크를 제안한다. 본 학위 논문은 크게 그래프를 이용한 의미론적 환경 이해와 의미론적 환경 이해를 통한 인간과 이종 로봇 간의 협업 방법으로 구성된다. 먼저, 그래프를 이용한 의미론적 환경 이해 부분에서는 의미론적 그래프 지도를 이용한 새로운 자연어 처리 방법에 대해 소개한다. 의미론적 그래프 지도 작성 방법은 로봇의 환경 인지 측면에서 많이 연구되었지만 이를 이용한 자연어 처리 방법은 거의 연구되지 않았다. 반면 컴퓨터 비전 분야에서는 이미지를 이용한 환경 이해 연구가 많이 이루어졌지만, 연속적인 장면들은 다루는데는 한계점이 있다. 따라서 우리는 그래프 스펙트럼 이론에 기반한 그래프 컨볼루션과 그래프 축소 레이어로 구성된 그래프 컨볼루션 신경망 및 순환 신경망을 이용하여 그래프를 설명하는 문장을 생성한다. 제안한 방법은 기존의 방법들보다 한 장면에 대해 향상된 성능을 보였으며 연속된 장면들에 대해서도 성공적으로 자연어 문장을 생성한다. 최근 딥러닝은 자연어 기반 환경 인지에 있어 급속도로 큰 발전을 이루었다. 하지만 인과 추론, 유추적 추론, 임무 계획과 같은 높은 수준의 프로세스에는 적용이 힘들다. 반면 임무를 수행하는 데 있어 각 에이전트의 능력에 맞게 행위들의 순서를 계산해주는 상징적 접근법(symbolic approach)은 추론과 임무 계획에 있어 뛰어난 성능을 보이지만 인간과 로봇들 사이의 의미론적 정보 공유 방법에 대해서는 거의 다루지 않는다. 따라서, 인간과 이종 로봇 간의 협업 방법 부분에서는 딥러닝 기법들과 상징적 플래너(symbolic planner)를 연결하는 프레임워크를 제안하여 의미론적 이해를 통한 인간 및 이종 로봇 간의 협업을 가능하게 한다. 우리는 의미론적 주변 환경 이해를 위해 이전 부분에서 제안한 그래프 기반 자연어 문장 생성을 수행한다. PDDL 플래너와 JENA-TDB는 각각 임무 계획 및 정보 획득 저장소로 사용한다. 제안한 방법의 효용성은 시뮬레이션을 통해 두 가지 상황에 대해서 검증한다. 하나는 동적 환경에서 임무 실패 상황이며 다른 하나는 넓은 공간에서 객체를 찾는 상황이다.1 Introduction 1 1.1 Background and Motivation 1 1.2 Literature Review 5 1.2.1 Natural Language-Based Human-Robot Cooperation 5 1.2.2 Artificial Intelligence Planning 5 1.3 The Problem Statement 10 1.4 Contributions 11 1.5 Dissertation Outline 12 2 Natural Language-Based Scene Graph Generation 14 2.1 Introduction 14 2.2 Related Work 16 2.3 Scene Graph Generation 18 2.3.1 Graph Construction 19 2.3.2 Graph Inference 19 2.4 Experiments 22 2.5 Summary 25 3 Language Description with 3D Semantic Graph 26 3.1 Introduction 26 3.2 Related Work 26 3.3 Natural Language Description 29 3.3.1 Preprocess 29 3.3.2 Graph Feature Extraction 33 3.3.3 Natural Language Description with Graph Features 34 3.4 Experiments 35 3.5 Summary 42 4 Natural Question with Semantic Graph 43 4.1 Introduction 43 4.2 Related Work 45 4.3 Natural Question Generation 47 4.3.1 Preprocess 49 4.3.2 Graph Feature Extraction 50 4.3.3 Natural Question with Graph Features 51 4.4 Experiments 52 4.5 Summary 58 5 PDDL Planning with Natural Language 59 5.1 Introduction 59 5.2 Related Work 60 5.3 PDDL Planning with Incomplete World Knowledge 61 5.3.1 Natural Language Process for PDDL Planning 63 5.3.2 PDDL Planning System 64 5.4 Experiments 65 5.5 Summary 69 6 PDDL Planning with Natural Language-Based Scene Understanding 70 6.1 Introduction 70 6.2 Related Work 74 6.3 A Framework for Heterogeneous Multi-Agent Cooperation 77 6.3.1 Natural Language-Based Cognition 78 6.3.2 Knowledge Engine 80 6.3.3 PDDL Planning Agent 81 6.4 Experiments 82 6.4.1 Experiment Setting 82 6.4.2 Scenario 84 6.4.3 Results 87 6.5 Summary 91 7 Conclusion 92Docto

SNU Open Repository and Archive

Dialogue Act Recognition via CRF-Attentive Structured Network

Author: Ang Jeremy
Chen Yun-Nung
Chen Zheqian
Geertzen Jeroen
Kalchbrenner Nal
Lee Ji Young
Pan Boyuan
Publication venue
Publication date: 15/11/2017
Field of study

Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual structural dependencies. In this paper, we consider the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structural dependencies without abandoning end-to-end training. We incorporate hierarchical semantic inference with memory mechanism on the utterance modeling. We then extend structured attention network to the linear-chain conditional random field layer which takes into account both contextual utterances and corresponding dialogue acts. The extensive experiments on two major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder Dialogue Act (MRDA) datasets show that our method achieves better performance than other state-of-the-art solutions to the problem. It is a remarkable fact that our method is nearly close to the human annotator's performance on SWDA within 2% gap.Comment: 10 pages, 4figure

arXiv.org e-Print Archive

Crossref