272,408 research outputs found
The voice activity detection (VAD) recorder and VAD network recorder : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University
The project is to provide a feasibility study for the AudioGraph tool, focusing on two application areas: the VAD (voice activity detector) recorder and the VAD network recorder. The first one achieves a low bit-rate speech recording on the fly, using a GSM compression coder with a simple VAD algorithm; and the second one provides two-way speech over IP, fulfilling echo cancellation with a simplex channel. The latter is required for implementing a synchronous AudioGraph. In the first chapter we introduce the background of this project, specifically, the VoIP technology, the AudioGraph tool, and the VAD algorithms. We also discuss the problems set for this project. The second chapter presents all the relevant techniques in detail, including sound representation, speech-coding schemes, sound file formats, PowerPlant and Macintosh programming issues, and the simple VAD algorithm we have developed. The third chapter discusses the implementation issues, including the systems' objective, architecture, the problems encountered and solutions used. The fourth chapter illustrates the results of the two applications. The user documentations for the applications are given, and after that, we analyse the parameters based on the results. We also present the default settings of the parameters, which could be used in the AudioGraph system. The last chapter provides conclusions and future work
Context-aware Synthesis for Video Frame Interpolation
Video frame interpolation algorithms typically estimate optical flow or its
variations and then use it to guide the synthesis of an intermediate frame
between two consecutive original frames. To handle challenges like occlusion,
bidirectional flow between the two input frames is often estimated and used to
warp and blend the input frames. However, how to effectively blend the two
warped frames still remains a challenging problem. This paper presents a
context-aware synthesis approach that warps not only the input frames but also
their pixel-wise contextual information and uses them to interpolate a
high-quality intermediate frame. Specifically, we first use a pre-trained
neural network to extract per-pixel contextual information for input frames. We
then employ a state-of-the-art optical flow algorithm to estimate bidirectional
flow between them and pre-warp both input frames and their context maps.
Finally, unlike common approaches that blend the pre-warped frames, our method
feeds them and their context maps to a video frame synthesis neural network to
produce the interpolated frame in a context-aware fashion. Our neural network
is fully convolutional and is trained end to end. Our experiments show that our
method can handle challenging scenarios such as occlusion and large motion and
outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy
- …