1,418 research outputs found

    ์ž ์žฌ ์ž„๋ฒ ๋”ฉ์„ ํ†ตํ•œ ์‹œ๊ฐ์  ์Šคํ† ๋ฆฌ๋กœ๋ถ€ํ„ฐ์˜ ์„œ์‚ฌ ํ…์ŠคํŠธ ์ƒ์„ฑ๊ธฐ ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์žฅ๋ณ‘ํƒ.The ability to understand the story is essential to make humans unique from other primates as well as animals. The capability of story understanding is crucial for AI agents to live with people in everyday life and understand their context. However, most research on story AI focuses on automated story generation based on closed worlds designed manually, which are widely used for computation authoring. Machine learning techniques on story corpora face similar problems of natural language processing such as omitting details and commonsense knowledge. Since the remarkable success of deep learning on computer vision field, increasing our interest in research on bridging between vision and language, vision-grounded story data will potentially improve the performance of story understanding and narrative text generation. Let us assume that AI agents lie in the environment in which the sensing information is input by the camera. Those agents observe the surroundings, translate them into the story in natural language, and predict the following event or multiple ones sequentially. This dissertation study on the related problems: learning stories or generating the narrative text from image streams or videos. The first problem is to generate a narrative text from a sequence of ordered images. As a solution, we introduce a GLAC Net (Global-local Attention Cascading Network). It translates from image sequences to narrative paragraphs in text as a encoder-decoder framework with sequence-to-sequence setting. It has convolutional neural networks for extracting information from images, and recurrent neural networks for text generation. We introduce visual cue encoders with stacked bidirectional LSTMs, and all of the outputs of each layer are aggregated as contextualized image vectors to extract visual clues. The coherency of the generated text is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially in the decoders. We evaluate the performance of it on the Visual storytelling (VIST) dataset. It outperforms other state-of-the-art results and shows the best scores in total score and all of 6 aspects in the visual storytelling challenge with evaluation of human judges. The second is to predict the following events or narrative texts with the former parts of stories. It should be possible to predict at any step with an arbitrary length. We propose recurrent event retrieval models as a solution. They train a context accumulation function and two embedding functions, where make close the distance between the cumulative context at current time and the next probable events on a latent space. They update the cumulative context with a new event as a input using bilinear operations, and we can find the next event candidates with the updated cumulative context. We evaluate them for Story Cloze Test, they show competitive performance and the best in open-ended generation setting. Also, it demonstrates the working examples in an interactive setting. The third deals with the study on composite representation learning for semantics and order for video stories. We embed each episode as a trajectory-like sequence of events on the latent space, and propose a ViStoryNet to regenerate video stories with them (tasks of story completion). We convert event sentences to thought vectors, and train functions to make successive event embed close each other to form episodes as trajectories. Bi-directional LSTMs are trained as sequence models, and decoders to generate event sentences with GRUs. We test them experimentally with PororoQA dataset, and observe that most of episodes show the form of trajectories. We use them to complete the blocked part of stories, and they show not perfect but overall similar result. Those results above can be applied to AI agents in the living area sensing with their cameras, explain the situation as stories, infer some unobserved parts, and predict the future story.์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋Šฅ๋ ฅ์€ ๋™๋ฌผ๋“ค ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์œ ์ธ์›๊ณผ ์ธ๋ฅ˜๋ฅผ ๊ตฌ๋ณ„์ง“๋Š” ์ค‘์š”ํ•œ ๋Šฅ๋ ฅ์ด๋‹ค. ์ธ๊ณต์ง€๋Šฅ์ด ์ผ์ƒ์ƒํ™œ ์†์—์„œ ์‚ฌ๋žŒ๋“ค๊ณผ ํ•จ๊ป˜ ์ง€๋‚ด๋ฉด์„œ ๊ทธ๋“ค์˜ ์ƒํ™œ ์† ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ, ๊ธฐ์กด์˜ ์Šคํ† ๋ฆฌ์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ์–ธ์–ด์ฒ˜๋ฆฌ์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ์‚ฌ์ „์— ์ •์˜๋œ ์„ธ๊ณ„ ๋ชจ๋ธ ํ•˜์—์„œ ์ข‹์€ ํ’ˆ์งˆ์˜ ์ €์ž‘๋ฌผ์„ ์ƒ์„ฑํ•˜๋ ค๋Š” ๊ธฐ์ˆ ์ด ์ฃผ๋กœ ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์Šคํ† ๋ฆฌ๋ฅผ ๋‹ค๋ฃจ๋ ค๋Š” ์‹œ๋„๋“ค์€ ๋Œ€์ฒด๋กœ ์ž์—ฐ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฐ์ดํ„ฐ์— ๊ธฐ๋ฐ˜ํ•  ์ˆ˜ ๋ฐ–์— ์—†์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ ๊ฒช๋Š” ๋ฌธ์ œ๋“ค์„ ๋™์ผํ•˜๊ฒŒ ๊ฒช๋Š”๋‹ค. ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‹œ๊ฐ์  ์ •๋ณด๊ฐ€ ํ•จ๊ป˜ ์—ฐ๋™๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹์˜ ๋ˆˆ๋ถ€์‹  ๋ฐœ์ „์— ํž˜์ž…์–ด ์‹œ๊ฐ๊ณผ ์–ธ์–ด ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ๋‹ค๋ฃจ๋Š” ์—ฐ๊ตฌ๋“ค์ด ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๋‹ค. ์—ฐ๊ตฌ์˜ ๋น„์ „์œผ๋กœ์„œ, ์ธ๊ณต์ง€๋Šฅ ์—์ด์ „ํŠธ๊ฐ€ ์ฃผ๋ณ€ ์ •๋ณด๋ฅผ ์นด๋ฉ”๋ผ๋กœ ์ž…๋ ฅ๋ฐ›๋Š” ํ™˜๊ฒฝ ์†์— ๋†“์—ฌ์žˆ๋Š” ์ƒํ™ฉ์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด ์•ˆ์—์„œ ์ธ๊ณต์ง€๋Šฅ ์—์ด์ „ํŠธ๋Š” ์ฃผ๋ณ€์„ ๊ด€์ฐฐํ•˜๋ฉด์„œ ๊ทธ์— ๋Œ€ํ•œ ์Šคํ† ๋ฆฌ๋ฅผ ์ž์—ฐ์–ด ํ˜•ํƒœ๋กœ ์ƒ์„ฑํ•˜๊ณ , ์ƒ์„ฑ๋œ ์Šคํ† ๋ฆฌ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ์— ์ผ์–ด๋‚  ์Šคํ† ๋ฆฌ๋ฅผ ํ•œ ๋‹จ๊ณ„์—์„œ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๊นŒ์ง€ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์ง„ ๋ฐ ๋น„๋””์˜ค ์†์— ๋‚˜ํƒ€๋‚˜๋Š” ์Šคํ† ๋ฆฌ(visual story)๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•, ๋‚ด๋Ÿฌํ‹ฐ๋ธŒ ํ…์ŠคํŠธ๋กœ์˜ ๋ณ€ํ™˜, ๊ฐ€๋ ค์ง„ ์‚ฌ๊ฑด ๋ฐ ๋‹ค์Œ ์‚ฌ๊ฑด์„ ์ถ”๋ก ํ•˜๋Š” ์—ฐ๊ตฌ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์—ฌ๋Ÿฌ ์žฅ์˜ ์‚ฌ์ง„์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฌธ์ œ(๋น„์ฃผ์–ผ ์Šคํ† ๋ฆฌํ…”๋ง)๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์ด ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด ๊ธ€๋ž™๋„ท(GLAC Net)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋จผ์ €, ์‚ฌ์ง„๋“ค๋กœ๋ถ€ํ„ฐ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง, ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ˆœํ™˜์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•œ๋‹ค. ์‹œํ€€์Šค-์‹œํ€€์Šค ๊ตฌ์กฐ์˜ ์ธ์ฝ”๋”๋กœ์„œ, ์ „์ฒด์ ์ธ ์ด์•ผ๊ธฐ ๊ตฌ์กฐ์˜ ํ‘œํ˜„์„ ์œ„ํ•ด ๋‹ค๊ณ„์ธต ์–‘๋ฐฉํ–ฅ ์ˆœํ™˜์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๋˜ ๊ฐ ์‚ฌ์ง„ ๋ณ„ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ „์—ญ์ -๊ตญ๋ถ€์  ์ฃผ์˜์ง‘์ค‘ ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋Š” ๋™์•ˆ ๋งฅ๋ฝ์ •๋ณด์™€ ๊ตญ๋ถ€์ •๋ณด๋ฅผ ์žƒ์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์•ž์„  ๋ฌธ์žฅ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์œ„ ์ œ์•ˆ ๋ฐฉ๋ฒ•์œผ๋กœ ๋น„์ŠคํŠธ(VIST) ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ํ•™์Šตํ•˜์˜€๊ณ , ์ œ 1 ํšŒ ์‹œ๊ฐ์  ์Šคํ† ๋ฆฌํ…”๋ง ๋Œ€ํšŒ(visual storytelling challenge)์—์„œ ์‚ฌ๋žŒ ํ‰๊ฐ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ „์ฒด ์ ์ˆ˜ ๋ฐ 6 ํ•ญ๋ชฉ ๋ณ„๋กœ ๋ชจ๋‘ ์ตœ๊ณ ์ ์„ ๋ฐ›์•˜๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์Šคํ† ๋ฆฌ์˜ ์ผ๋ถ€๊ฐ€ ๋ฌธ์žฅ๋“ค๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ๋ฌธ์žฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์ž„์˜์˜ ๊ธธ์ด์˜ ์Šคํ† ๋ฆฌ์— ๋Œ€ํ•ด ์ž„์˜์˜ ์œ„์น˜์—์„œ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ•˜๊ณ , ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋‹จ๊ณ„ ์ˆ˜์— ๋ฌด๊ด€ํ•˜๊ฒŒ ์ž‘๋™ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ˆœํ™˜ ์‚ฌ๊ฑด ์ธ์ถœ ๋ชจ๋ธ(Recurrent Event Retrieval Models)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์€๋‹‰ ๊ณต๊ฐ„ ์ƒ์—์„œ ํ˜„์žฌ๊นŒ์ง€ ๋ˆ„์ ๋œ ๋งฅ๋ฝ๊ณผ ๋‹ค์Œ์— ๋ฐœ์ƒํ•  ์œ ๋ ฅ ์‚ฌ๊ฑด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ฐ€๊น๊ฒŒ ํ•˜๋„๋ก ๋งฅ๋ฝ๋ˆ„์ ํ•จ์ˆ˜์™€ ๋‘ ๊ฐœ์˜ ์ž„๋ฒ ๋”ฉ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ ์ž…๋ ฅ๋˜์–ด ์žˆ๋˜ ์Šคํ† ๋ฆฌ์— ์ƒˆ๋กœ์šด ์‚ฌ๊ฑด์ด ์ž…๋ ฅ๋˜๋ฉด ์Œ์„ ํ˜•์  ์—ฐ์‚ฐ์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋งฅ๋ฝ์„ ๊ฐœ์„ ํ•˜์—ฌ ๋‹ค์Œ์— ๋ฐœ์ƒํ•  ์œ ๋ ฅํ•œ ์‚ฌ๊ฑด๋“ค์„ ์ฐพ๋Š”๋‹ค. ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฝ์Šคํ† ๋ฆฌ(ROCStories) ๋ฐ์ดํ„ฐ์ง‘ํ•ฉ์„ ํ•™์Šตํ•˜์˜€๊ณ , ์Šคํ† ๋ฆฌ ํด๋กœ์ฆˆ ํ…Œ์ŠคํŠธ(Story Cloze Test)๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ์ž„์˜์˜ ๊ธธ์ด๋กœ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฒ• ์ค‘์— ์ตœ๊ณ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ, ๋น„๋””์˜ค ์Šคํ† ๋ฆฌ์—์„œ ์‚ฌ๊ฑด ์‹œํ€€์Šค ์ค‘ ์ผ๋ถ€๊ฐ€ ๊ฐ€๋ ค์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ณต๊ตฌํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ํŠนํžˆ, ๊ฐ ์‚ฌ๊ฑด์˜ ์˜๋ฏธ ์ •๋ณด์™€ ์ˆœ์„œ๋ฅผ ๋ชจ๋ธ์˜ ํ‘œํ˜„ ํ•™์Šต์— ๋ฐ˜์˜ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์€๋‹‰ ๊ณต๊ฐ„ ์ƒ์— ๊ฐ ์—ํ”ผ์†Œ๋“œ๋“ค์„ ๊ถค์  ํ˜•ํƒœ๋กœ ์ž„๋ฒ ๋”ฉํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ๋ฅผ ์žฌ์ƒ์„ฑ์„ ํ•˜์—ฌ ์Šคํ† ๋ฆฌ ์™„์„ฑ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ธ ๋น„์Šคํ† ๋ฆฌ๋„ท(ViStoryNet)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฐ ์—ํ”ผ์†Œ๋“œ๋ฅผ ๊ถค์  ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ๊ฑด ๋ฌธ์žฅ์„ ์‚ฌ๊ณ ๋ฒกํ„ฐ(thought vector)๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์—ฐ์† ์ด๋ฒคํŠธ ์ˆœ์„œ ์ž„๋ฒ ๋”ฉ์„ ํ†ตํ•ด ์ „ํ›„ ์‚ฌ๊ฑด๋“ค์ด ์„œ๋กœ ๊ฐ€๊น๊ฒŒ ์ž„๋ฒ ๋”ฉ๋˜๋„๋ก ํ•˜์—ฌ ํ•˜๋‚˜์˜ ์—ํ”ผ์†Œ๋“œ๊ฐ€ ๊ถค์ ์˜ ๋ชจ์–‘์„ ๊ฐ€์ง€๋„๋ก ํ•™์Šตํ•˜์˜€๋‹ค. ๋ฝ€๋กœ๋กœQA ๋ฐ์ดํ„ฐ์ง‘ํ•ฉ์„ ํ†ตํ•ด ์‹คํ—˜์ ์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ์ž„๋ฒ ๋”ฉ ๋œ ์—ํ”ผ์†Œ๋“œ๋“ค์€ ๊ถค์  ํ˜•ํƒœ๋กœ ์ž˜ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ, ์—ํ”ผ์†Œ๋“œ๋“ค์„ ์žฌ์ƒ์„ฑ ํ•ด๋ณธ ๊ฒฐ๊ณผ ์ „์ฒด์ ์ธ ์ธก๋ฉด์—์„œ ์œ ์‚ฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. ์œ„ ๊ฒฐ๊ณผ๋ฌผ๋“ค์€ ์นด๋ฉ”๋ผ๋กœ ์ž…๋ ฅ๋˜๋Š” ์ฃผ๋ณ€ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ผ๋ถ€ ๊ด€์ธก๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ ์ถ”๋ก ํ•˜๋ฉฐ, ํ–ฅํ›„ ์Šคํ† ๋ฆฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€์‘๋œ๋‹ค.Abstract i Chapter 1 Introduction 1 1.1 Story of Everyday lives in Videos and Story Understanding . . . 1 1.2 Problems to be addressed . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 6 1.4 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2 Background and Related Work 10 2.1 Why We Study Stories . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Latent Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Order Embedding and Ordinal Embedding . . . . . . . . . . . . 14 2.4 Comparison to Story Understanding . . . . . . . . . . . . . . . . 15 2.5 Story Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.1 Abstract Event Representations . . . . . . . . . . . . . . . 17 2.5.2 Seq-to-seq Attentional Models . . . . . . . . . . . . . . . . 18 2.5.3 Story Generation from Images . . . . . . . . . . . . . . . 19 Chapter 3 Visual Storytelling via Global-local Attention Cascading Networks 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Evaluation for Visual Storytelling . . . . . . . . . . . . . . . . . . 26 3.3 Global-local Attention Cascading Networks (GLAC Net) . . . . . 27 3.3.1 Encoder: Contextualized Image Vector Extractor . . . . . 28 3.3.2 Decoder: Story Generator with Attention and Cascading Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 VIST Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . 33 3.4.3 Network Training Details . . . . . . . . . . . . . . . . . . 36 3.4.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 38 3.4.5 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 38 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 4 Common Space Learning on Cumulative Contexts and the Next Events: Recurrent Event Retrieval Models 44 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Problems of Context Accumulation . . . . . . . . . . . . . . . . . 45 4.3 Recurrent Event Retrieval Models for Next Event Prediction . . 46 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Story Cloze Test . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.3 Open-ended Story Generation . . . . . . . . . . . . . . . . 53 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 5 ViStoryNet: Order Embedding of Successive Events and the Networks for Story Regeneration 58 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Order Embedding with Triple Learning . . . . . . . . . . . . . . 60 5.2.1 Embedding Ordered Objects in Sequences . . . . . . . . . 62 5.3 Problems and Contextual Events . . . . . . . . . . . . . . . . . . 62 5.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 62 5.3.2 Contextual Event Vectors from Kids Videos . . . . . . . . 64 5.4 Architectures for the Story Regeneration Task . . . . . . . . . . . 67 5.4.1 Two Sentence Generators as Decoders . . . . . . . . . . . 68 5.4.2 Successive Event Order Embedding (SEOE) . . . . . . . . 68 5.4.3 Sequence Models of the Event Space . . . . . . . . . . . . 72 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 73 5.5.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 73 5.5.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 74 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Concluding Remarks 80 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 80 6.2 Limitation and Outlook . . . . . . . . . . . . . . . . . . . . . . . 81 6.3 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 81 ์ดˆ๋ก 101Docto

    HMC-Based Accelerator Design For Compressed Deep Neural Networks

    Get PDF
    Deep Neural Networks (DNNs) offer remarkable performance of classifications and regressions in many high dimensional problems and have been widely utilized in real-word cognitive applications. In DNN applications, high computational cost of DNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. Moreover, energy consumption and performance cost of moving data between memory hierarchy and computational units are higher than that of the computation itself. To overcome the memory bottleneck, data locality and temporal data reuse are improved in accelerator design. In an attempt to further improve data locality, memory manufacturers have invented 3D-stacked memory where multiple layers of memory arrays are stacked on top of each other. Inherited from the concept of Process-In-Memory (PIM), some 3D-stacked memory architectures also include a logic layer that can integrate general-purpose computational logic directly within main memory to take advantages of high internal bandwidth during computation. In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compression and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling controller. In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation. In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compres- sion and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling con- troller. In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation

    Nitric Oxide and the Biological Cascades Underlying Increased Neurogenesis, Enhanced Learning Ability, and Academic Ability as an Effect of Increased Bouts of Physical Activity

    Get PDF
    International Journal of Exercise Science 5(3) : 245-275, 2012. The consummate principle underlying all physiological research is corporeal adaptation at every level of the organism observed. With respect to humans, the body learns to function based on the external stimuli from the environment, beginning in the womb, throughout the developmental stages of life. Nitric Oxide (NO) appears to be the governor of the plasticity of several systems in mammals implicit in their proper development. It is the purpose of this review to describe the physiological pathways that lead to plasticity of not only the vasculature but also of the brain and how physical activity plays a key role in those alterations by initiating the mechanism that triggers NO production. Further, this review hopes to show a connection between these changes and learning, comprising both motor learning and cognitive learning. This review will show how NO plays a significant role in vascularization and neurogenesis, necessary to enhance the mind-body connection and comprehensive physical performance and adaptation. It is our belief that this review effectively demonstrates, using a multidisciplinary approach, the causal mechanisms underlying the increases in neurogenesis as related to improved learning and academic performance as a result of adequate bouts of physical activity of a vigorous nature

    Speech Enhancement Exploiting the Source-Filter Model

    Get PDF
    Imagining everyday life without mobile telephony is nowadays hardly possible. Calls are being made in every thinkable situation and environment. Hence, the microphone will not only pick up the userโ€™s speech but also sound from the surroundings which is likely to impede the understanding of the conversational partner. Modern speech enhancement systems are able to mitigate such effects and most users are not even aware of their existence. In this thesis the development of a modern single-channel speech enhancement approach is presented, which uses the divide and conquer principle to combat environmental noise in microphone signals. Though initially motivated by mobile telephony applications, this approach can be applied whenever speech is to be retrieved from a corrupted signal. The approach uses the so-called source-filter model to divide the problem into two subproblems which are then subsequently conquered by enhancing the source (the excitation signal) and the filter (the spectral envelope) separately. Both enhanced signals are then used to denoise the corrupted signal. The estimation of spectral envelopes has quite some history and some approaches already exist for speech enhancement. However, they typically neglect the excitation signal which leads to the inability of enhancing the fine structure properly. Both individual enhancement approaches exploit benefits of the cepstral domain which offers, e.g., advantageous mathematical properties and straightforward synthesis of excitation-like signals. We investigate traditional model-based schemes like Gaussian mixture models (GMMs), classical signal processing-based, as well as modern deep neural network (DNN)-based approaches in this thesis. The enhanced signals are not used directly to enhance the corrupted signal (e.g., to synthesize a clean speech signal) but as so-called a priori signal-to-noise ratio (SNR) estimate in a traditional statistical speech enhancement system. Such a traditional system consists of a noise power estimator, an a priori SNR estimator, and a spectral weighting rule that is usually driven by the results of the aforementioned estimators and subsequently employed to retrieve the clean speech estimate from the noisy observation. As a result the new approach obtains significantly higher noise attenuation compared to current state-of-the-art systems while maintaining a quite comparable speech component quality and speech intelligibility. In consequence, the overall quality of the enhanced speech signal turns out to be superior as compared to state-of-the-art speech ehnahcement approaches.Mobiltelefonie ist aus dem heutigen Leben nicht mehr wegzudenken. Telefonate werden in beliebigen Situationen an beliebigen Orten gefรผhrt und dabei nimmt das Mikrofon nicht nur die Sprache des Nutzers auf, sondern auch die Umgebungsgerรคusche, welche das Verstรคndnis des Gesprรคchspartners stark beeinflussen kรถnnen. Moderne Systeme kรถnnen durch Sprachverbesserungsalgorithmen solchen Effekten entgegenwirken, dabei ist vielen Nutzern nicht einmal bewusst, dass diese Algorithmen existieren. In dieser Arbeit wird die Entwicklung eines einkanaligen Sprachverbesserungssystems vorgestellt. Der Ansatz setzt auf das Teile-und-herrsche-Verfahren, um stรถrende Umgebungsgerรคusche aus Mikrofonsignalen herauszufiltern. Dieses Verfahren kann fรผr sรคmtliche Fรคlle angewendet werden, in denen Sprache aus verrauschten Signalen extrahiert werden soll. Der Ansatz nutzt das Quelle-Filter-Modell, um das ursprรผngliche Problem in zwei Unterprobleme aufzuteilen, die anschlieรŸend gelรถst werden, indem die Quelle (das Anregungssignal) und das Filter (die spektrale Einhรผllende) separat verbessert werden. Die verbesserten Signale werden gemeinsam genutzt, um das gestรถrte Mikrofonsignal zu entrauschen. Die Schรคtzung von spektralen Einhรผllenden wurde bereits in der Vergangenheit erforscht und zum Teil auch fรผr die Sprachverbesserung angewandt. Typischerweise wird dabei jedoch das Anregungssignal vernachlรคssigt, so dass die spektrale Feinstruktur des Mikrofonsignals nicht verbessert werden kann. Beide Ansรคtze nutzen jeweils die Eigenschaften der cepstralen Domรคne, die unter anderem vorteilhafte mathematische Eigenschaften mit sich bringen, sowie die Mรถglichkeit, Prototypen eines Anregungssignals zu erzeugen. Wir untersuchen modellbasierte Ansรคtze, wie z.B. GauรŸsche Mischmodelle, klassische signalverarbeitungsbasierte Lรถsungen und auch moderne tiefe neuronale Netzwerke in dieser Arbeit. Die so verbesserten Signale werden nicht direkt zur Sprachsignalverbesserung genutzt (z.B. Sprachsynthese), sondern als sogenannter A-priori-Signal-zu-Rauschleistungs-Schรคtzwert in einem traditionellen statistischen Sprachverbesserungssystem. Dieses besteht aus einem Stรถrleistungs-Schรคtzer, einem A-priori-Signal-zu-Rauschleistungs-Schรคtzer und einer spektralen Gewichtungsregel, die รผblicherweise mit Hilfe der Ergebnisse der beiden Schรคtzer berechnet wird. SchlieรŸlich wird eine Schรคtzung des sauberen Sprachsignals aus der Mikrofonaufnahme gewonnen. Der neue Ansatz bietet eine signifikant hรถhere Dรคmpfung des Stรถrgerรคuschs als der bisherige Stand der Technik. Dabei wird eine vergleichbare Qualitรคt der Sprachkomponente und der Sprachverstรคndlichkeit gewรคhrleistet. Somit konnte die Gesamtqualitรคt des verbesserten Sprachsignals gegenรผber dem Stand der Technik erhรถht werden

    Novel linear and nonlinear optical signal processing for ultra-high bandwidth communications

    Get PDF
    The thesis is articulated around the theme of ultra-wide bandwidth single channel signals. It focuses on the two main topics of transmission and processing of information by techniques compatible with high baudrates. The processing schemes introduced combine new linear and nonlinear optical platforms such as Fourier-domain programmable optical processors and chalcogenide chip waveguides, as well as the concept of neural network. Transmission of data is considered in the context of medium distance links of Optical Time Division Multiplexed (OTDM) data subject to environmental fluctuations. We experimentally demonstrate simultaneous compensation of differential group delay and multiple orders of dispersion at symbol rates of 640 Gbaud and 1.28 Tbaud. Signal processing at high bandwidth is envisaged both in the case of elementary post-transmission analog error mitigation and in the broader field of optical computing for high level operations (โ€œoptical processorโ€). A key innovation is the introduction of a novel four-wave mixing scheme implementing a dot-product operation between wavelength multiplexed channels. In particular, it is demonstrated for low-latency hash-key based all-optical error detection in links encoded with advanced modulation formats. Finally, the work presents groundbreaking concepts for compact implementation of an optical neural network as a programmable multi-purpose processor. The experimental architecture can implement neural networks with several nodes on a single optical nonlinear transfer function implementing functions such as analog-to-digital conversion. The particularity of the thesis is the new approaches to optical signal processing that potentially enable high level operations using simple optical hardware and limited cascading of components

    An Open Logic Approach to EPM

    Get PDF
    open2noEPM is a high operative and didactic versatile tool and new application areas are envisaged continuously. In turn, this new awareness has allowed to enlarge our panorama for neurocognitive system EPM is a high operative and didactic versatile tool and new application areas are envisaged continuosly. In turn, this new awareness has allowed to enlarge our panorama for neurocognitive system behavior understanding, and to develop information conservation and regeneration systems in a numeric self-reflexive/reflective evolutive reference framework. Unfortunately, a logically closed model cannot cope with ontological uncertainty by itself; it needs a complementary logical aperture operational support extension. To achieve this goal, it is possible to use two coupled irreducible information management subsystems, based on the following ideal coupled irreducible asymptotic dichotomy: "Information Reliable Predictability" and "Information Reliable Unpredictability" subsystems. To behave realistically, overall system must guarantee both Logical Closure and Logical Aperture, both fed by environmental "noise" (betterโ€ฆ from what human beings call "noise"). So, a natural operating point can emerge as a new Trans-disciplinary Reality Level, out of the Interaction of Two Complementary Irreducible Information Management Subsystems within their environment. In this way, it is possible to extend the traditional EPM approach in order to profit by both classic EPM intrinsic Self-Reflexive Functional Logical Closure and new numeric CICT Self-Reflective Functional Logical Aperture. EPM can be thought as a reliable starting subsystem to initialize a process of continuous self-organizing and self-logic learning refinement. understanding, and to develop information conservation and regeneration systems in a numeric self-reflexive/reflective evolutive reference framework. Unfortunately, a logically closed model cannot cope with ontological uncertainty by itself; it needs a complementary logical aperture operational support extension. To achieve this goal, it is possible to use two coupled irreducible information management subsystems, based on the following ideal coupled irreducible asymptotic dichotomy: "Information Reliable Predictability" and "Information Reliable Unpredictability" subsystems. To behave realistically, overall system must guarantee both Logical Closure and Logical Aperture, both fed by environmental "noise" (betterโ€ฆ from what human beings call "noise"). So, a natural operating point can emerge as a new Trans-disciplinary Reality Level, out of the Interaction of Two Complementary Irreducible Information Management Subsystems within their environment. In this way, it is possible to extend the traditional EPM approach in order to profit by both classic EPM intrinsic Self-Reflexive Functional Logical Closure and new numeric CICT Self-Reflective Functional Logical Aperture. EPM can be thought as a reliable starting subsystem to initialize a process of continuous self-organizing and self-logic learning refinement.Fiorini, Rodolfo; Degiacomo, PieroFiorini, Rodolfo; Degiacomo, Pier

    Pulse stream VLSI circuits and techniques for the implementation of neural networks

    Get PDF

    FrAmework for Multi-Agency Environments (FAME) : Final Report of the Learning & Evaluation Strand

    Get PDF
    Framework for Multi-agency Environments (FAME) was one of the Local Government On-Line funded National Projects sponsored by the Office of the Deputy Prime Minister (ODPM). Within FAME there were six local projects (known as strands) led by English local authorities in partnership with service providers. Each strand aimed to improve a particular set of services (for example, to vulnerable older people or disabled children) through effective and appropriate exchange of information. These local projects worked with IT suppliers (known as technology partners) to produce a technical system to facilitate the exchange and management of client / patient information across agency boundaries. Not all the outputs of FAME were in the form of IT systems. Improvements to business processes and information sharing practices were also expected. Newcastle University led two further strands, the Generic Framework and Learning & Evaluation. The Generic Framework identifies and describes nine building blocks that are essential to effective multi-agency working. The FAME website http://www.fame-uk.org contains details of these building blocks, together with a โ€˜how toโ€™ guide and a toolkit to support local authorities and their partners in assessing their โ€˜readinessโ€™ for multi-agency working. This is the report of the Learning & Evaluation strand. The Learning & Evaluation team worked closely with the local FAME project teams, who were supportive of our work and generous with their time. Throughout the project we reported back to the local teams both individually and collectively. Evaluation was thoroughgoing and critical, not an exercise in public relations or advocacy. It is important to stress that learning is likely to be gained from what did not work as well as from what did. Problems and setbacks, as well as successes, are therefore documented and analysed in the report
    • โ€ฆ
    corecore