4 research outputs found

    Development of fitted bodice basic pattern for Korean men using 3D body shape

    No full text
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์˜๋ฅ˜ํ•™๊ณผ,2005.Maste

    A Study on POS Guidance Module and Multimodal-basedImage Captioning Model

    No full text
    Image captioning aims to describe the information of an image in detail and to be structured in a certain grammatical structure, so that the user can understand the content easily. In particular, since it deals with two types of independent data, images and natural language, deep learning-based image captioning research that can create sentences with accurate content and various expressions using neural networks suitable for each data is being actively conducted. In recent years, deep learning-based image captioning is mainly researched on a method of focusing on image information through attention-based research in which sentences are generated by focusing on a core part of an image in the same way as a human visual system. However, since these methods do not take into account the sentence structure, the grammatical structure may be poorly structured, resulting in sentences that are difficult to understand. Therefore, in order to generate sentences with accurate grammatical structure and rich expression, we proposes a Part-Of-Speech(POS) Guidance Module and a multimodal-based image captioning model that directly utilizes the sentence structure information. The proposed POS Guidance Module uses a POS guide variable that applies an additional weight to each data according to the POS information on image features and sentences in order to generate sentences with rich expression. In addition, the proposed POS Multimoal-based image captioning method is used to generate sentences with an accurate grammatical structure by correcting the predicted words of the next timestep according to the POS order information. In the POS multimodal layer, the generated sentence information obtained from Decoder's Bi-LSTM is corrected according to the POS sequence information to predict a word corresponding to the next point of view prediction part of speech. Through this, it complies with the rules of part-of-speech, has an accurate grammatical structure, and creates sentences with more expressive than existing researches. To verify the validity of the proposed model, in this paper, learning and evaluation were conducted with Flicker 30K and MS COCO datasets, and objective evaluation metrics such as BLEU, METEOR, CIDEr, SPICE, and ROUGE is compared. The performance of the proposed model, all evaluation metrics are improved overall compared to the recent comparison models, in particular, the CIDEr score increased 8.85% and 3.03%, respectively, compared to the comparison models trained with each dataset. In addition, in the SPICE score, 0.06 points and 0.002 points were high for each dataset, and through this, it was found that the proposed model produced an accurate explanation sentence that fits the content of the image with concentrated information through part of speech. In addition, by comparing the generated sentences of the comparative models for the given images, it was confirmed that the proposed model not only described sentences with an certain grammatical structure but also generated sentences with rich expression. Through this study, we expected that more expressive and accurate sentences can be generated, which can be used in fields requiring analysis of a given image, such as medical care, image summary, and surveillance, and widely commercialized.์ด๋ฏธ์ง€ ์บก์…”๋‹์€ ์‚ฌ์šฉ์ž๊ฐ€ ๋‚ด์šฉ์„ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๋„๋ก ์ด๋ฏธ์ง€์˜ ์ •๋ณด๋ฅผ ์ƒ์„ธํ•˜๊ฒŒ ๋ฌ˜์‚ฌํ•˜๊ณ  ์ •ํ™•ํ•œ ๋ฌธ๋ฒ• ๊ตฌ์กฐ๋กœ ๋ช…๋ฃŒํ•˜๊ฒŒ ์„œ์ˆ ๋˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ํŠนํžˆ ์ด๋ฏธ์ง€์™€ ์ž์—ฐ์–ด๋ผ๋Š” ๋…๋ฆฝ์ ์ธ ๋‘ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ ๋ฐ์ดํ„ฐ์— ๋งž๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ด ์ •ํ™•ํ•œ ๋‚ด์šฉ๊ณผ ๋‹ค์–‘ํ•œ ํ‘œํ˜„์˜ ๋ฌธ์žฅ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ์บก์…”๋‹ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ์บก์…”๋‹์€ ์ตœ๊ทผ ์ธ๊ฐ„์˜ ์‹œ๊ฐ ์ฒด๊ณ„์™€ ๋™์ผํ•˜๊ฒŒ ์ด๋ฏธ์ง€์˜ ํ•ต์‹ฌ ๋ถ€๋ถ„์— ์ง‘์ค‘ํ•˜์—ฌ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋Š” Attention ๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€ ์ •๋ณด์— ์ง‘์ค‘ํ•˜๋Š” ์ ‘๊ทผ๋ฒ•์ด ์ฃผ๋กœ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ๋ฐฉ์‹๋“ค์€ ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๋ฌธ๋ฒ• ๊ตฌ์กฐ๊ฐ€ ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ •ํ™•ํ•œ ๋ฌธ๋ฒ•๊ตฌ์กฐ์™€ ํ’๋ถ€ํ•œ ํ‘œํ˜„์„ ๊ฐ€์ง„ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ์žฅ์˜ ๊ตฌ์กฐ ์ •๋ณด์ธ ํ’ˆ์‚ฌ๋ฅผ ์ง์ ‘์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ํ’ˆ์‚ฌ Guidance Module๊ณผ Multimodal ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ์บก์…”๋‹ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ํ’๋ถ€ํ•œ ํ‘œํ˜„์˜ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆํ•˜๋Š” ํ’ˆ์‚ฌ Guidance Module์€ ์ด๋ฏธ์ง€ ํŠน์ง• ์ •๋ณด์™€ ๋ฌธ์žฅ ์ •๋ณด๋ฅผ ํ’ˆ์‚ฌ์— ๋”ฐ๋ผ ๊ฐ ๋ฐ์ดํ„ฐ์— ์ถ”๊ฐ€์ ์ธ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•˜๋Š” ํ’ˆ์‚ฌ ๊ฐ€์ด๋“œ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ’ˆ์‚ฌ ์ˆœ์„œ ์ •๋ณด์— ๋”ฐ๋ผ ๋‹ค์Œ ์‹œ์  ์˜ˆ์ธก ๋‹จ์–ด๋ฅผ ๊ต์ •ํ•˜์—ฌ ์ •ํ™•ํ•œ ๋ฌธ๋ฒ• ๊ตฌ์กฐ์˜ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆํ•˜๋Š” ํ’ˆ์‚ฌ Multimoal ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ์บก์…”๋‹ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ํ’ˆ์‚ฌ Multimodal ๋ ˆ์ด์–ด์—์„œ๋Š” Decoder์˜ Bi-LSTM์—์„œ ์–ป์–ด์ง€๋Š” ์ƒ์„ฑ ๋ฌธ์žฅ ์ •๋ณด๋ฅผ ํ’ˆ์‚ฌ ์ˆœ์„œ ์ •๋ณด์— ๋”ฐ๋ผ ๊ต์ •ํ•˜์—ฌ ๋‹ค์Œ ์‹œ์  ์˜ˆ์ธก ํ’ˆ์‚ฌ์— ํ•ด๋‹นํ•˜๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ’ˆ์‚ฌ ๊ทœ์น™์„ ์ค€์ˆ˜ํ•˜๋Š” ์ •ํ™•ํ•œ ๋ฌธ๋ฒ• ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋ฉด์„œ ๋™์‹œ์— ๊ธฐ์กด์˜ ์—ฐ๊ตฌ๋ณด๋‹ค ํ’๋ถ€ํ•œ ํ‘œํ˜„์„ ๊ฐ€์ง„ ๋‚ด์šฉ์˜ ๋ฌธ์žฅ์„ ๋งŒ๋“ ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์˜ ํƒ€๋‹น์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Flicker 30K์™€ MS COCO ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๊ณผ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ–ˆ์œผ๋ฉฐ, ๊ฐ๊ด€์ ์ธ ํ‰๊ฐ€ ์ง€ํ‘œ์ธ BLEU, METEOR, CIDEr, SPICE, ROUGE๋ฅผ ๋น„๊ตํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ์ตœ๊ทผ์˜ ๋น„๊ต ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๋ชจ๋“  ํ‰๊ฐ€ ์ง€ํ‘œ ์ ์ˆ˜๊ฐ€ ์ „์ฒด์ ์œผ๋กœ ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ ํŠนํžˆ, CIDEr ์ ์ˆ˜์—์„œ ๊ฐ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šตํ•œ ๋น„๊ต ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๊ฐ๊ฐ 8.85%, 3.03% ์ƒ์Šนํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  SPICE ์ ์ˆ˜์—์„œ ๊ฐ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด 0.06์ , 0.002์  ๋†’์•˜์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์ด ํ’ˆ์‚ฌ๋ฅผ ํ†ตํ•ด ์ง‘์ค‘๋œ ์ •๋ณด๋“ค๋กœ ์ด๋ฏธ์ง€์˜ ๋‚ด์šฉ์— ๋งž๋Š” ์ •ํ™•ํ•œ ์„ค๋ช…์„ ์ƒ์„ฑํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๋“ค์— ๋Œ€ํ•œ ๋น„๊ต ๋ชจ๋ธ๋“ค์˜ ์ƒ์„ฑ ๋ฌธ์žฅ๋“ค๊ณผ ๋น„๊ตํ•ด ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์ด ์ •ํ™•ํ•œ ๋ฌธ๋ฒ• ๊ตฌ์กฐ๋กœ ๋ฌธ์žฅ์„ ์„œ์ˆ ํ–ˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ’๋ถ€ํ•œ ํ‘œํ˜„์˜ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๋” ํ’๋ถ€ํ•œ ํ‘œํ˜„๊ณผ ์ •ํ™•ํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์–ด ์˜๋ฃŒ, ์˜์ƒ ์š”์•ฝ, ๊ฐ์‹œ ๋“ฑ๊ณผ ๊ฐ™์€ ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ถ„์„์ด ํ•„์š”ํ•œ ๋ถ„์•ผ์— ํ™œ์šฉ๋˜์–ด ๋„๋ฆฌ ์ƒ์šฉํ™”๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.1. ์„œ ๋ก  1 2. ๊ด€ ๋ จ ์ด ๋ก  5 2.1 Encoder-Decoder ํ”„๋ ˆ์ž„์›Œํฌ 5 2.1.1 Encoder 7 2.1.2 Decoder 9 2.2 Inject&Merge based Image Captioning Architecture 12 2.3 Bidirectional Recurrent Neural Network 14 2.4 Evaluation Metrics of Image Captioning 16 2.4.1 BLEU 17 2.4.2 ROUGE 18 2.4.3 METEOR 19 2.4.4 CIDEr 19 2.4.5 SPICE 20 3. ์ œ์•ˆํ•œ ์ด๋ฏธ์ง€ ์บก์…”๋‹ ๋ชจ๋ธ 21 3.1 ํ’ˆ์‚ฌ Guidance Module 23 3.2 ํ’ˆ์‚ฌ Multimodal ๋ ˆ์ด์–ด๋ฅผ ํ™œ์šฉํ•œ ๋ฌธ์žฅ ์ƒ์„ฑ 25 4. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ 28 4.1 ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ธฐ์ดˆ ์„ค์ • 28 4.2 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„ 30 5. ๊ฒฐ ๋ก  41 6. ์ฐธ ๊ณ  ๋ฌธ ํ—Œ 42Maste

    Means-End Chain Model: A Review of Research in Korea, 2001-2015

    No full text
    corecore