3 research outputs found
A study on Image Caption using Double Embedding Technique and Bi-RNN
๋ณธ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ํํ๋ ฅ์ ํฅ์์ํค๊ณ ์ด๋ฏธ์ง ํน์ง ๋ฒกํฐ์ ์๋ฉธ์ ๋ฐฉ์งํ ์ ์๋ ์ด์ค Embedding ๊ธฐ๋ฒ๊ณผ ๋ฌธ๋งฅ์ ๋ง๋ ๋ฌธ์ฅ ์์๋ฅผ ์์ฑํ๋ Bidirectional Recurrent Neural Network(Bi-RNN)์ ์ ์ฉํ ๋ํ
์ผํ ์ด๋ฏธ์ง ์บก์
๋ชจ๋ธ์ ์ ์ํ๋ค. ์ด์ค Embedding ๊ธฐ๋ฒ์์, Word Embedding ๊ณผ์ ์ธ Embeddingโ
์ ์บก์
์ ํํ๋ ฅ์ ํฅ์์ํค๊ธฐ ์ํด ๋ฐ์ดํฐ์ธํธ์ ์บก์
๋จ์ด๋ฅผ One-hot encoding ๋ฐฉ์์ ํตํด ๋ฒกํฐํํ๊ณ Embeddingโ
ก๋ ์บก์
์์ฑ ๊ณผ์ ์์ ๋ฐ์ํ๋ ์ด๋ฏธ์ง ํน์ง์ ์๋ฉธ์ ๋ฐฉ์งํ๊ธฐ ์ํด ์ด๋ฏธ์ง ํน์ง ๋ฒกํฐ์ ๋จ์ด ๋ฒกํฐ๋ฅผ ์ตํฉํจ์ผ๋ก์จ ๋ฌธ์ฅ ๊ตฌ์ฑ ์์์ ๋๋ฝ์ ๋ฐฉ์งํ๋ค. ๋ํ ๋์ฝ๋ ์์ญ์ ์ดํ ๋ฐ ์ด๋ฏธ์ง ํน์ง์ ์๋ฐฉํฅ์ผ๋ก ํ๋ํ๋ Bi-RNN์ผ๋ก ๊ตฌ์ฑํ์ฌ ๋ฌธ๋งฅ์ ๋ง๋ ๋ฌธ์ฅ์ ์์๋ฅผ ํ์ตํ๋ค. ๋ง์ง๋ง์ผ๋ก ์ธ์ฝ๋์ ๋์ฝ๋๋ฅผ ํตํ์ฌ ํ๋๋ ์ ์ฒด ์ด๋ฏธ์ง, ๋ฌธ์ฅ ํํ, ๋ฌธ์ฅ ์์ ํน์ง๋ค์ ํ๋์ ๋ฒกํฐ๊ณต๊ฐ์ธ Multimodal ๋ ์ด์ด์ ์ตํฉํจ์ผ๋ก์จ ๋ฌธ์ฅ์ ์์์ ํํ๋ ฅ์ ๋ชจ๋ ๊ณ ๋ คํ ๋ํ
์ผํ ์บก์
์ ์์ฑํ๋ค. ์ ์ํ๋ ๋ชจ๋ธ์ Flickr 8K ๋ฐ Flickr 30K, MSCOCO์ ๊ฐ์ ์ด๋ฏธ์ง ์บก์
๋ฐ์ดํฐ์ธํธ๋ฅผ ์ด์ฉํ์ฌ ํ์ต ๋ฐ ํ๊ฐ๋ฅผ ์งํํ์์ผ๋ฉฐ ๊ฐ๊ด์ ์ธ BLEU์ METEOR ์ ์๋ฅผ ํตํด ๋ชจ๋ธ ์ฑ๋ฅ์ ์ฐ์์ฑ์ ์
์ฆํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ ์ํ ๋ชจ๋ธ์ 3๊ฐ์ ๋ค๋ฅธ ์บก์
๋ชจ๋ธ๋ค์ ๋นํด BLEU ์ ์๋ ์ต๋ 20.2์ , METEOR ์ ์๋ ์ต๋ 3.65์ ์ด ํฅ์๋์๋ค.|This thesis proposes a detailed image caption model that applies the double embedding technique to improve sentence expressiveness and to prevent vanishing of image feature vectors. It uses the bidirectional recurrent neural network (Bi-RNN) to generate a sequence of sentences and fit their contexts. In the double-embedding technique, embedding โ
is a word-embedding process used to vectorize dataset captions through one-hot encoding to improve the expressiveness of the captions. Embedding โ
ก prevents missed sentence components by fusing image features and word vectors to prevent image features from vanishing during caption generation. The decoder area, composed of a Bi-RNN that acquires vocabulary and image features in both directions, learns the sequence of sentences that fits their contexts. Finally, through the encoder and decoder, the detailed image caption is generated by considering both sequence and sentence expressiveness by fusing the acquired image features, sentence presentation features, and sentence sequence features into a multimodal layer as a vector space. The proposed model was learned and evaluated using image caption datasets (e.g., Flickr 8K, Flickr 30K, and MSCOCO). The proven BLEU and METEOR scores demonstrate the superiority of the model. The proposed model achieved a BLEU score maximum of 20.2 points and a METEOR score maximum of 3.65 points, which is higher than the scores of other three caption models.๋ชฉ ์ฐจ
๋ชฉ ์ฐจ โ
ฐ
๊ทธ๋ฆผ ๋ฐ ํ ๋ชฉ์ฐจ โ
ฑ
Abstract โ
ณ
์ 1 ์ฅ ์ ๋ก 01
์ 2 ์ฅ ๋ด๋ด ๋คํธ์ํฌ ๋ฐ ํ๊ฐ์งํ 04
2.1 Convolutional Neural Network 04
2.2 Recurrent Neural Network 08
2.3 Long Short-Term Memory 10
2.4 Gated Recurrent Unit 13
2.5 Bidirectional Recurrent Neural Network 15
2.6 Bi-Lingual Evaluation Understudy 17
2.7 Metric for Evaluation of Translation with Explicit ORdering 20
์ 3 ์ฅ ์ ์ํ ์ด๋ฏธ์ง ์บก์
๋ชจ๋ธ 23
3.1 ์ด์ค Embedding ๊ธฐ๋ฒ๊ณผ Bi-RNN์ ์ด์ฉํ ์บก์
๊ตฌ์ฑ ๊ณผ์ 25
3.2 Multimodal ๋ ์ด์ด๋ฅผ ์ด์ฉํ ์บก์
์์ฑ ๊ณผ์ 27
์ 4 ์ฅ ์คํ ๋ฐ ๊ฒฐ๊ณผ 29
4.1 ๋ฐ์ดํฐ์ธํธ ๋ฐ ์ ์ฒ๋ฆฌ ๊ณผ์ 29
4.2 ์คํ ๊ฒฐ๊ณผ ๋ถ์ 31
์ 5 ์ฅ ๊ฒฐ ๋ก 41
์ฐธ ๊ณ ๋ฌธ ํ 42Maste
Recommended from our members
Spatio-temporal object persistence modeling and semantics for long-term robot navigation
Mobile robots increasingly operate in real-world environments that are subject to change over time. Robots that maintain up-to-date, accurate representations of their environment can more robustly perform long-term autonomous navigation and planning tasks. Uninterrupted robot autonomy is important for many tasks where human intervention is undesirable or impossible including space operations, hazardous environment surveillance, military reconnaissance, and more. This dissertation explores how a robot can adapt maps over long periods of time and predict changes in their environments to maintain long-term navigational autonomy.
Spatio-temporal Object Persistence (STOP) models enable a robot to assign temporal characteristics to recognized objects based on their observed persistence in the world. A robot develops these identifying characteristics by iteratively estimating parameters for a Weibull distribution survival model using recursive Bayesian Survival Analysis with Markov Chain Monte Carlo methods. The parameters of the model provide temporally identifying features for semantic classification of objects. These characteristics then allow a robot to estimate the true lifespan of objects and predict when they will leave the environment. A robot updates its belief map based on the modelled temporal behavior of objects in its environment and then perform more intelligent planning and navigation for future operations. Furthermore, once the robot develops temporal class characteristics, it can transfer and apply these characteristics to objects in dynamically similar environments, thus allowing the robot to adapt more quickly to new operational spaces.
In this dissertation, we establish the efficacy of STOP models for temporal semantic object classification and show how a robot uses them to adapt a map over time in a semi-static environment. We provide a series of experiments to demonstrate various aspects of our implementation. Our results confirm that the learned models not only predict the temporal behavior of objects in the world but also transfer to unknown, but temporally similar operation spaces where they improve the prediction process.Mechanical Engineerin