2 research outputs found

    A Study on Frame Prediction Method based on Operation Probability Map

    Get PDF
    λ™μ˜μƒλ‚΄μ—μ„œ 손상에 μ˜ν•΄ μ†Œμ‹€λœ ν”„λ ˆμž„μ„ λ³΅μ›ν•˜κ±°λ‚˜ 연속적인 μƒˆλ‘œμš΄ ν”„λ ˆμž„μ„ μƒμ„±ν•˜λŠ” 기법인 ν”„λ ˆμž„ μ˜ˆμΈ‘μ€ κ°μ²΄λ“€μ˜ λ™μž‘ 예츑이 ν•„μš”ν•œ μžμœ¨μ£Όν–‰, λ³΄μ•ˆ λ“±μ˜ 미래 μ£Όμš” κΈ°μˆ λ‘œμ„œ μ£Όλͺ©λ°›κ³  μžˆλ‹€. 졜근 이 κΈ°μˆ μ€ λ”₯λŸ¬λ‹ 기술과 κ²°ν•©ν•˜μ—¬ 예츑 정확도가 많이 ν–₯μƒλ˜κ³  μžˆμœΌλ‚˜ λ§Žμ€ ν•™μŠ΅λ°μ΄ν„°μ™€ μ—°μ‚°λŸ‰μ΄ 수반되기 λ•Œλ¬Έμ— μ‹€μ§ˆμ μΈ μ μš©μ—λŠ” 어렀움이 μ‘΄μž¬ν•œλ‹€. 기쑴의 λ”₯λŸ¬λ‹ 기반 예츑 λͺ¨λΈμ€ μƒˆλ‘œμš΄ ν”„λ ˆμž„ 생성 κ³Όμ •μ—μ„œ μ˜ˆμΈ‘μ— μ˜ν•΄ μƒμ„±λœ ν”„λ ˆμž„μ„ ν”Όλ“œλ°±ν•˜κΈ° λ•Œλ¬Έμ— λˆ„μ μ˜€μ°¨κ°€ 많이 λ°œμƒν•˜μ—¬ μ‹œκ°„μ΄ 지남에 따라 예츑 정확도가 κ°μ†Œν•œλ‹€. λ”°λΌμ„œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” convolution neural network (CNN)와 long short-term memory (LSTM)으둜 κ΅¬μ„±λœ λ„€νŠΈμ›Œν¬λ₯Ό 톡해 ν”„λ ˆμž„λ“€μ˜ λ™μž‘ νŠΉμ§•λ“€μ„ μΆ”μΆœν•˜κ³  νŒ¨ν„΄μ„ ν•™μŠ΅ν•˜μ—¬ λ™μž‘ ν™•λ₯  지도λ₯Ό μƒμ„±ν•˜μ—¬ μ›€μ§μž„μ΄ λ°œμƒν•œ μ˜μ—­μ— λŒ€ν•˜μ—¬ deconvolution neural network(DNN)λ₯Ό 톡해 이후 ν”„λ ˆμž„μ„ μƒμ„±ν•˜λŠ” μƒˆλ‘œμš΄ ν”„λ ˆμž„ 예츑 λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μ œμ•ˆν•œ λͺ¨λΈμ€ CNNκ³Ό LSTM을 톡해 ν”„λ ˆμž„λ“€μ˜ λ™μž‘ νŠΉμ§•λ“€μ„ μΆ”μΆœν•˜κ³  νŒ¨ν„΄μ„ ν•™μŠ΅ν•˜μ—¬ λ™μž‘ ν™•λ₯  지도λ₯Ό μƒμ„±ν•œλ‹€. 이λ₯Ό 톡해 μž„μ˜μ˜ ν•œ ν”„λ ˆμž„μ—μ„œ λ™μž‘μ΄ λ°œμƒν•˜λŠ” μ˜μ—­λ₯Ό νŒλ³„ν•˜κ³  이 μ˜μ—­λ§Œ DNN을 톡해 μƒˆλ‘œμš΄ ν”„λ ˆμž„μ„ νšλ“ν•œλ‹€. μ΄λ•Œ ν•™μŠ΅ λ‚œμ΄λ„κ°€ 높은 DNN의 효율적인 ν•™μŠ΅μ„ μœ„ν•΄ generative adversarial nets(GAN) 기법을 μ μš©ν•œλ‹€. μ œμ•ˆλœ μƒˆλ‘œμš΄ λͺ¨λΈμ˜ ν•™μŠ΅κ³Ό 검증을 μœ„ν•˜μ—¬ λ¬΄μž‘μœ„λ‘œ 일뢀 ν”„λ ˆμž„μ΄ 제거된 λ‘œλ΄‡ μ›€μ§μž„ μ˜μƒμ„ 기반으둜 μƒμ„±λœ μ˜μƒκ³Ό 원본 μ˜μƒμ„ PSNR둜 비ꡐ λΆ„μ„ν•˜μ˜€λ‹€. κ·Έ κ²°κ³Ό, μ œμ•ˆν•œ ν”„λ ˆμž„ 예츑 λͺ¨λΈμ˜ PSNR은 35.16으둜 λΉ„κ΅ν•œ 3개의 λ‹€λ₯Έ λͺ¨λΈμ— λΉ„ν•΄ μ΅œλŒ€ 14.06이 ν–₯μƒλ˜μ—ˆλ‹€. λ˜ν•œ μƒμ„±λœ ν”„λ ˆμž„μ— λ”°λ₯Έ PSNR의 κ°μ†Œλ„ 4번째 ν”„λ ˆμž„ μ΄μ „μ—λŠ” 2, μ΄ν›„μ—λŠ” 7둜 평균 5κ°€ κ°œμ„ λ˜μ—ˆλ‹€.|Frame prediction, which is a technique to reconstruct frames lost due to damage or to generate new consecutive frames in the video, is attracting attention as a main technology which is indispensable for the autonomous vehicle and the artificial intelligence based security system that require motion prediction of objects. Recently, this technology has improved prediction accuracy in combination with deep learning technology, but it is difficulties in practical application because it involves a lot of learning data and computation amount. The existing deep learning based prediction model, since the frame generated by the prediction is feedback in the new frame generation process, is decreased the prediction accuracy over time. Therefore, in this paper, we propose an operation probability map based new frame prediction model using convolution neural network (CNN), long short-term, memory (LSTM) and deconvolution neural network(DNN) to minimize unnecessary computation regions in the frame and prediction error. The proposed model extracts the operating characteristics of the frames through CNN and LSTM and learns the patterns to generate the operation probability map. Through this process, a region in which an operation occurs is determined in one frame, and a new frame is obtained through DNN only in this region. At this time, the generative adversarial nets(GAN) technique is applied for efficient learning of DNN with the high learning complexity. For the learning and verification of the proposed new model, we compared and analyzed the generated frame and the original frame based on robotic motion images with some frames removed randomly using PSNR. As a result, the PSNR of the proposed frame prediction model is 35.16, which is 14.06 higher than the other three models. Also, the decrease of the PSNR according to the generated frame is decreased to 2 before the 4th frame and then to 7 thereafter, and is improved by 5 on the average.Chapter 1 Introduction 01 Chapter 2 Related Works 06 2.1 Convolution Neural Network 06 2.2 Long Short-Term Memory 09 2.3 Generative Adversarial Nets 12 Chapter 3 The Proposed Prediction Model 15 3.1 Structure of the proposed model 17 3.2 Model for feature extraction and operation probability estimation 21 3.3 Model for generating and combining images 24 3.4 Model for learning of generative model 27 Chapter 4 Experiment and Result 29 4.1 Dataset for learning and testing 29 4.2 Analysis of experimental results 30 Chapter 5 Conclusion 37 Reference 38Maste

    Novel Motion Anchoring Strategies for Wavelet-based Highly Scalable Video Compression

    Full text link
    This thesis investigates new motion anchoring strategies that are targeted at wavelet-based highly scalable video compression (WSVC). We depart from two practices that are deeply ingrained in existing video compression systems. Instead of the commonly used block motion, which has poor scalability attributes, we employ piecewise-smooth motion together with a highly scalable motion boundary description. The combination of this more β€œphysical” motion description together with motion discontinuity information allows us to change the conventional strategy of anchoring motion at target frames to anchoring motion at reference frames, which improves motion inference across time. In the proposed reference-based motion anchoring strategies, motion fields are mapped from reference to target frames, where they serve as prediction references; during this mapping process, disoccluded regions are readily discovered. Observing that motion discontinuities displace with foreground objects, we propose motion-discontinuity driven motion mapping operations that handle traditionally challenging regions around moving objects. The reference-based motion anchoring exposes an intricate connection between temporal frame interpolation (TFI) and video compression. When employed in a compression system, all anchoring strategies explored in this thesis perform TFI once all residual information is quantized to zero at a given temporal level. The interpolation performance is evaluated on both natural and synthetic sequences, where we show favourable comparisons with state-of-the-art TFI schemes. We explore three reference-based motion anchoring strategies. In the first one, the motion anchoring is β€œflipped” with respect to a hierarchical B-frame structure. We develop an analytical model to determine the weights of the different spatio-temporal subbands, and assess the suitability and benefits of this reference-based WSVC for (highly scalable) video compression. Reduced motion coding cost and improved frame prediction, especially around moving objects, result in improved rate-distortion performance compared to a target-based WSVC. As the thesis evolves, the motion anchoring is progressively simplified to one where all motion is anchored at one base frame; this central motion organization facilitates the incorporation of higher-order motion models, which improve the prediction performance in regions following motion with non-constant velocity
    corecore