Search CORE

3 research outputs found

A study on Image Caption using Double Embedding Technique and Bi-RNN

Author: 이준희
Publication venue: 한국해양대학교 대학원
Publication date: 01/08/2018
Field of study

본 논문에서는 문장 표현력을 향상시키고 이미지 특징 벡터의 소멸을 방지할 수 있는 이중 Embedding 기법과 문맥에 맞는 문장 순서를 생성하는 Bidirectional Recurrent Neural Network(Bi-RNN)을 적용한 디테일한 이미지 캡션 모델을 제안한다. 이중 Embedding 기법에서, Word Embedding 과정인 EmbeddingⅠ은 캡션의 표현력을 향상시키기 위해 데이터세트의 캡션 단어를 One-hot encoding 방식을 통해 벡터화하고 EmbeddingⅡ는 캡션 생성 과정에서 발생하는 이미지 특징의 소멸을 방지하기 위해 이미지 특징 벡터와 단어 벡터를 융합함으로써 문장 구성 요소의 누락을 방지한다. 또한 디코더 영역은 어휘 및 이미지 특징을 양방향으로 획득하는 Bi-RNN으로 구성하여 문맥에 맞는 문장의 순서를 학습한다. 마지막으로 인코더와 디코더를 통하여 획득된 전체 이미지, 문장 표현, 문장 순서 특징들을 하나의 벡터공간인 Multimodal 레이어에 융합함으로써 문장의 순서와 표현력을 모두 고려한 디테일한 캡션을 생성한다. 제안하는 모델은 Flickr 8K 및 Flickr 30K, MSCOCO와 같은 이미지 캡션 데이터세트를 이용하여 학습 및 평가를 진행하였으며 객관적인 BLEU와 METEOR 점수를 통해 모델 성능의 우수성을 입증하였다. 그 결과, 제안한 모델은 3개의 다른 캡션 모델들에 비해 BLEU 점수는 최대 20.2점, METEOR 점수는 최대 3.65점이 향상되었다.|This thesis proposes a detailed image caption model that applies the double embedding technique to improve sentence expressiveness and to prevent vanishing of image feature vectors. It uses the bidirectional recurrent neural network (Bi-RNN) to generate a sequence of sentences and fit their contexts. In the double-embedding technique, embedding Ⅰ is a word-embedding process used to vectorize dataset captions through one-hot encoding to improve the expressiveness of the captions. Embedding Ⅱ prevents missed sentence components by fusing image features and word vectors to prevent image features from vanishing during caption generation. The decoder area, composed of a Bi-RNN that acquires vocabulary and image features in both directions, learns the sequence of sentences that fits their contexts. Finally, through the encoder and decoder, the detailed image caption is generated by considering both sequence and sentence expressiveness by fusing the acquired image features, sentence presentation features, and sentence sequence features into a multimodal layer as a vector space. The proposed model was learned and evaluated using image caption datasets (e.g., Flickr 8K, Flickr 30K, and MSCOCO). The proven BLEU and METEOR scores demonstrate the superiority of the model. The proposed model achieved a BLEU score maximum of 20.2 points and a METEOR score maximum of 3.65 points, which is higher than the scores of other three caption models.목 차 목 차 ⅰ 그림 및 표 목차 ⅱ Abstract ⅳ 제 1 장 서 론 01 제 2 장 뉴럴 네트워크 및 평가지표 04 2.1 Convolutional Neural Network 04 2.2 Recurrent Neural Network 08 2.3 Long Short-Term Memory 10 2.4 Gated Recurrent Unit 13 2.5 Bidirectional Recurrent Neural Network 15 2.6 Bi-Lingual Evaluation Understudy 17 2.7 Metric for Evaluation of Translation with Explicit ORdering 20 제 3 장 제안한 이미지 캡션 모델 23 3.1 이중 Embedding 기법과 Bi-RNN을 이용한 캡션 구성 과정 25 3.2 Multimodal 레이어를 이용한 캡션 생성 과정 27 제 4 장 실험 및 결과 29 4.1 데이터세트 및 전처리 과정 29 4.2 실험 결과 분석 31 제 5 장 결 론 41 참 고 문 헌 42Maste

한국해양대학교(KMOU)

Automated System for Semantic Object Labeling With Soft-Object Recognition and Dynamic Programming Segmentation

Author: Cody Phillips
Dinesh Thakur
John Bergstrom
Jonas Cleveland
Kostas Daniilidis
Philip Dames
Terry Kientz
Vijay Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Spatio-temporal object persistence modeling and semantics for long-term robot navigation

Author: Symmank Meredith Leeann
Publication venue
Publication date: 11/07/2024
Field of study

Mobile robots increasingly operate in real-world environments that are subject to change over time. Robots that maintain up-to-date, accurate representations of their environment can more robustly perform long-term autonomous navigation and planning tasks. Uninterrupted robot autonomy is important for many tasks where human intervention is undesirable or impossible including space operations, hazardous environment surveillance, military reconnaissance, and more. This dissertation explores how a robot can adapt maps over long periods of time and predict changes in their environments to maintain long-term navigational autonomy. Spatio-temporal Object Persistence (STOP) models enable a robot to assign temporal characteristics to recognized objects based on their observed persistence in the world. A robot develops these identifying characteristics by iteratively estimating parameters for a Weibull distribution survival model using recursive Bayesian Survival Analysis with Markov Chain Monte Carlo methods. The parameters of the model provide temporally identifying features for semantic classification of objects. These characteristics then allow a robot to estimate the true lifespan of objects and predict when they will leave the environment. A robot updates its belief map based on the modelled temporal behavior of objects in its environment and then perform more intelligent planning and navigation for future operations. Furthermore, once the robot develops temporal class characteristics, it can transfer and apply these characteristics to objects in dynamically similar environments, thus allowing the robot to adapt more quickly to new operational spaces. In this dissertation, we establish the efficacy of STOP models for temporal semantic object classification and show how a robot uses them to adapt a map over time in a semi-static environment. We provide a series of experiments to demonstrate various aspects of our implementation. Our results confirm that the learned models not only predict the temporal behavior of objects in the world but also transfer to unknown, but temporally similar operation spaces where they improve the prediction process.Mechanical Engineerin

Texas ScholarWorks