2,598 research outputs found

    Understanding the Perceived Quality of Video Predictions

    Full text link
    The study of video prediction models is believed to be a fundamental approach to representation learning for videos. While a plethora of generative models for predicting the future frame pixel values given the past few frames exist, the quantitative evaluation of the predicted frames has been found to be extremely challenging. In this context, we study the problem of quality assessment of predicted videos. We create the Indian Institute of Science Predicted Videos Quality Assessment (IISc PVQA) Database consisting of 300 videos, obtained by applying different prediction models on different datasets, and accompanying human opinion scores. We collected subjective ratings of quality from 50 human participants for these videos. Our subjective study reveals that human observers were highly consistent in their judgments of quality of predicted videos. We benchmark several popularly used measures for evaluating video prediction and show that they do not adequately correlate with these subjective scores. We introduce two new features to effectively capture the quality of predicted videos, motion-compensated cosine similarities of deep features of predicted frames with past frames, and deep features extracted from rescaled frame differences. We show that our feature design leads to state of the art quality prediction in accordance with human judgments on our IISc PVQA Database. The database and code are publicly available on our project website: https://nagabhushansn95.github.io/publications/2020/pvqaComment: Project website: https://nagabhushansn95.github.io/publications/2020/pvqa.htm

    A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

    Full text link
    We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.Comment: 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6 figures and 3 tables for the appendi

    Somatic ABC's: A Theoretical Framework for Designing, Developing and Evaluating the Building Blocks of Touch-Based Information Delivery

    Get PDF
    abstract: Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.Dissertation/ThesisPh.D. Computer Science 201

    Neural theory for the perception of causal actions

    Get PDF
    The efficient prediction of the behavior of others requires the recognition of their actions and an understanding of their action goals. In humans, this process is fast and extremely robust, as demonstrated by classical experiments showing that human observers reliably judge causal relationships and attribute interactive social behavior to strongly simplified stimuli consisting of simple moving geometrical shapes. While psychophysical experiments have identified critical visual features that determine the perception of causality and agency from such stimuli, the underlying detailed neural mechanisms remain largely unclear, and it is an open question why humans developed this advanced visual capability at all. We created pairs of naturalistic and abstract stimuli of hand actions that were exactly matched in terms of their motion parameters. We show that varying critical stimulus parameters for both stimulus types leads to very similar modulations of the perception of causality. However, the additional form information about the hand shape and its relationship with the object supports more fine-grained distinctions for the naturalistic stimuli. Moreover, we show that a physiologically plausible model for the recognition of goal-directed hand actions reproduces the observed dependencies of causality perception on critical stimulus parameters. These results support the hypothesis that selectivity for abstract action stimuli might emerge from the same neural mechanisms that underlie the visual processing of natural goal-directed action stimuli. Furthermore, the model proposes specific detailed neural circuits underlying this visual function, which can be evaluated in future experiments.Seventh Framework Programme (European Commission) (Tango Grant FP7-249858-TP3 and AMARSi Grant FP7-ICT- 248311)Deutsche Forschungsgemeinschaft (Grant GI 305/4-1)Hermann and Lilly Schilling Foundation for Medical Researc

    The Nomic Role Account of Carving Reality at the Joints

    Get PDF
    http://klinechair.missouri.edu/Vita_Revised.htm (#51)Natural properties are those that carve reality at the joints. The notion of carving reality at the joints, however, is somewhat obscure. It is sometimes understood in terms of making for similarity, sometimes in terms of conferring causal powers, and sometimes in terms of figuring in the laws of nature. I develop and assess an account of the third sort according to which carving reality at the joints is understood as having the right level of determinacy relative to nomic roles. The account has the attraction of involving only very weak metaphysical presuppositions, but it fails to capture several features that natural properties are presumed to have

    Analyzing liquids

    Get PDF

    On human motion prediction using recurrent neural networks

    Full text link
    Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.Comment: Accepted at CVPR 1
    • 

    corecore