1,461 research outputs found

    Perceptually-Aligned Frame Rate Selection Using Spatio-Temporal Features

    Get PDF

    Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

    Get PDF
    We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths.

    Study of Compression Statistics and Prediction of Rate-Distortion Curves for Video Texture

    Get PDF
    Encoding textural content remains a challenge for current standardised video codecs. It is therefore beneficial to understand video textures in terms of both their spatio-temporal characteristics and their encoding statistics in order to optimize encoding performance. In this paper, we analyse the spatio-temporal features and statistics of video textures, explore the rate-quality performance of different texture types and investigate models to mathematically describe them. For all considered theoretical models, we employ machine-learning regression to predict the rate-quality curves based solely on selected spatio-temporal features extracted from uncompressed content. All experiments were performed on homogeneous video textures to ensure validity of the observations. The results of the regression indicate that using an exponential model we can more accurately predict the expected rate-quality curve (with a mean Bj{\o}ntegaard Delta rate of 0.46% over the considered dataset) while maintaining a low relative complexity. This is expected to be adopted by in the loop processes for faster encoding decisions such as rate-distortion optimisation, adaptive quantization, partitioning, etc.Comment: 17 page

    Velocity-Based LOD Reduction in Virtual Reality: A Psychometric Approach

    Full text link
    Virtual Reality headsets enable users to explore the environment by performing self-induced movements. The retinal velocity produced by such motion reduces the visual system's ability to resolve fine detail. We measured the impact of self-induced head rotations on the ability to detect quality changes of a realistic 3D model in an immersive virtual reality environment. We varied the Level-of-Detail (LOD) as a function of rotational head velocity with different degrees of severity. Using a psychophysical method, we asked 17 participants to identify which of the two presented intervals contained the higher quality model under two different maximum velocity conditions. After fitting psychometric functions to data relating the percentage of correct responses to the aggressiveness of LOD manipulations, we identified the threshold severity for which participants could reliably (75\%) detect the lower LOD model. Participants accepted an approximately four-fold LOD reduction even in the low maximum velocity condition without a significant impact on perceived quality, which suggests that there is considerable potential for optimisation when users are moving (increased range of perceptual uncertainty). Moreover, LOD could be degraded significantly more in the maximum head velocity condition, suggesting these effects are indeed speed dependent

    Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming

    Full text link
    Adaptive live video streaming applications use a fixed predefined configuration for the bitrate ladder with constant framerate and encoding presets in a session. However, selecting optimized framerates and presets for every bitrate ladder representation can enhance perceptual quality, improve computational resource allocation, and thus, the streaming energy efficiency. In particular, low framerates for low-bitrate representations reduce compression artifacts and decrease encoding energy consumption. In addition, an optimized preset may lead to improved compression efficiency. To this light, this paper proposes a Content-adaptive Variable Framerate (CVFR) encoding scheme, which offers two modes of operation: ecological (ECO) and high-quality (HQ). CVFR-ECO optimizes for the highest encoding energy savings by predicting the optimized framerate for each representation in the bitrate ladder. CVFR-HQ takes it further by predicting each representation's optimized framerate-encoding preset pair using low-complexity discrete cosine transform energy-based spatial and temporal features for compression efficiency and sustainable storage. We demonstrate the advantage of CVFR using the x264 open-source video encoder. The results show that CVFR-ECO yields an average PSNR and VMAF increase of 0.02 dB and 2.50 points, respectively, for the same bitrate, compared to the fastest preset highest framerate encoding. CVFR-ECO also yields an average encoding and storage energy consumption reduction of 34.54% and 76.24%, considering a just noticeable difference (JND) of six VMAF points. In comparison, CVFR-HQ yields an average increase in PSNR and VMAF of 2.43 dB and 10.14 points, respectively, for the same bitrate. Finally, CVFR-HQ resulted in an average reduction in storage energy consumption of 83.18%, considering a JND of six VMAF points

    Dance-the-music : an educational platform for the modeling, recognition and audiovisual monitoring of dance steps using spatiotemporal motion templates

    Get PDF
    In this article, a computational platform is presented, entitled “Dance-the-Music”, that can be used in a dance educational context to explore and learn the basics of dance steps. By introducing a method based on spatiotemporal motion templates, the platform facilitates to train basic step models from sequentially repeated dance figures performed by a dance teacher. Movements are captured with an optical motion capture system. The teachers’ models can be visualized from a first-person perspective to instruct students how to perform the specific dance steps in the correct manner. Moreover, recognition algorithms-based on a template matching method can determine the quality of a student’s performance in real time by means of multimodal monitoring techniques. The results of an evaluation study suggest that the Dance-the-Music is effective in helping dance students to master the basics of dance figures
    • …
    corecore