7,629 research outputs found

    Video Frame Interpolation via Adaptive Separable Convolution

    Get PDF
    Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an intermediate frame guided by motion. Recent approaches merge these two steps into a single convolution process by convolving input frames with spatially adaptive kernels that account for motion and re-sampling simultaneously. These methods require large kernels to handle large motion, which limits the number of pixels whose kernels can be estimated at once due to the large memory demand. To address this problem, this paper formulates frame interpolation as local separable convolution over input frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D kernels require significantly fewer parameters to be estimated. Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously. Since our method is able to estimate kernels and synthesizes the whole video frame at once, it allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames. This deep neural network is trained end-to-end using widely available video data without any human annotation. Both qualitative and quantitative experiments show that our method provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv

    Free vibration analysis of laminated composite plates based on FSDT using one-dimensional IRBFN method

    Get PDF
    This paper presents a new effective radial basis function (RBF) collocation technique for the free vibration analysis of laminated composite plates using the first order shear deformation theory (FSDT). The plates, which can be rectangular or non-rectangular, are simply discretised by means of Cartesian grids. Instead of using conventional differentiated RBF networks, one-dimensional integrated RBF networks (1D-IRBFN) are employed on grid lines to approximate the field variables. A number of examples concerning various thickness-to-span ratios, material properties and boundary conditions are considered. Results obtained are compared with the exact solutions and numerical results by other techniques in the literature to investigate the performance of the proposed method

    POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset

    Full text link
    Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system's ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations

    Motion-Adjustable Neural Implicit Video Representation

    Get PDF
    Implicit neural representation (INR) has been successful in representing static images. Contemporary image-based INR, with the use of Fourier-based positional encoding, can be viewed as a mapping from sinusoidal patterns with different frequencies to image content. Inspired by that view, we hypothesize that it is possible to generate temporally varying content with a single image-based INR model by displacing its input sinusoidal patterns over time. By exploiting the relation between the phase information in sinusoidal functions and their displacements, we incorporate into the conventional image-based INR model a phase-varying positional encoding module, and couple it with a phase-shift generation module that determines the phase-shift values at each frame. The model is trained end-to-end on a video to jointly determine the phase-shift values at each time with the mapping from the phase-shifted sinusoidal functions to the corresponding frame, enabling an implicit video representation. Experiments on a wide range of videos suggest that such a model is capable of learning to interpret phase-varying positional embeddings into the corresponding time-varying content. More importantly, we found that the learned phase-shift vectors tend to capture meaningful temporal and motion information from the video. In particular, manipulating the phase-shift vectors induces meaningful changes in the temporal dynamics of the resulting video, enabling non-trivial temporal and motion editing effects such as temporal interpolation, motion magnification, motion smoothing, and video loop detection
    • ā€¦
    corecore