16 research outputs found

    Complexity measurement and characterization of 360-degree content

    Get PDF
    The appropriate characterization of the test material, used for subjective evaluation tests and for benchmarking image and video processing algorithms and quality metrics, can be crucial in order to perform comparative studies that provide useful insights. This paper focuses on the characterisation of 360-degree images. We discuss why it is important to take into account the geometry of the signal and the interactive nature of 360-degree content navigation, for a perceptual characterization of these signals. Particularly, we show that the computation of classical indicators of spatial complexity, commonly used for 2D images, might lead to different conclusions depending on the geometrical domain use

    Complexity measurement and characterization of 360-degree content

    Get PDF
    The appropriate characterization of the test material, used for subjective evaluation tests and for benchmarking image and video processing algorithms and quality metrics, can be crucial in order to perform comparative studies that provide useful insights. This paper focuses on the characterisation of 360-degree images. We discuss why it is important to take into account the geometry of the signal and the interactive nature of 360-degree content navigation, for a perceptual characterization of these signals. Particularly, we show that the computation of classical indicators of spatial complexity, commonly used for 2D images, might lead to different conclusions depending on the geometrical domain use

    A saliency dispersion measure for improving saliency-based image quality metrics

    Get PDF
    Objective image quality metrics (IQMs) potentially benefit from the addition of visual saliency. However, challenges to optimising the performance of saliency-based IQMs remain. A previous eye-tracking study has shown that gaze is concentrated in fewer places in images with highly salient features than in images lacking salient features. From this, it can be inferred that the former are more likely to benefit from adding a saliency term to an IQM. To understand whether these ideas still hold when using computational saliency instead of eyetracking data, we first conducted a statistical evaluation using 15 state of the art saliency models and 10 well-known IQMs. We then used the results to devise an algorithm which adaptively incorporates saliency in IQMs for natural scenes, based on saliency dispersion. Experimental results demonstrate this can give significant improvement

    A saliency dispersion measure for improving saliency-based image quality metrics

    Get PDF
    Objective image quality metrics (IQMs) potentially benefit from the addition of visual saliency. However, challenges to optimising the performance of saliency-based IQMs remain. A previous eye-tracking study has shown that gaze is concentrated in fewer places in images with highly salient features than in images lacking salient features. From this, it can be inferred that the former are more likely to benefit from adding a saliency term to an IQM. To understand whether these ideas still hold when using computational saliency instead of eyetracking data, we first conducted a statistical evaluation using 15 state of the art saliency models and 10 well-known IQMs. We then used the results to devise an algorithm which adaptively incorporates saliency in IQMs for natural scenes, based on saliency dispersion. Experimental results demonstrate this can give significant improvement

    Understanding User Navigation in Immersive Experience: an Information-Theoretic analysis

    Get PDF
    To cope with the large bandwidth and low-latency requirements, Virtual Reality (VR) systems are steering toward user-centric systems in which coding, streaming, and possibly rendering are personalized to the final user. The success of these user-centric VR systems mainly relies on the ability to anticipate viewers navigation. This has motivated a large attention in studying the prediction of user's movements in a VR experience. However, most of these work lack of a proper and exhaustive behavioural analysis in a VR scenario, leaving many key-behavioural questions unsolved and unexplored: Can some users be more predictable than others? Do users have their own way of navigating and how much is this affected by the video content features? Can we quantify the similarity of users navigation? Answering these questions is a crucial step toward the understanding of user's behaviour in VR; and it is the overall goal of this paper. By studying VR trajectories across different contents and through information-theoretic tools, we aim at characterizing navigation patterns both for each single viewer (profiling individually viewers - intra-user analysis) and for a multitude of viewers (identifying common patterns among viewers - inter-user analysis). For each of these proposed behavioural analyses, we describe the applied metrics and key observations that can be extrapolated

    Movie Editing and Cognitive Event Segmentation in Virtual Reality Video

    Get PDF
    Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR. To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation

    Predicting the valence of a scene from observers’ eye movements

    Get PDF
    Multimedia analysis benefits from understanding the emotional content of a scene in a variety of tasks such as video genre classification and content-based image retrieval. Recently, there has been an increasing interest in applying human bio-signals, particularly eye movements, to recognize the emotional gist of a scene such as its valence. In order to determine the emotional category of images using eye movements, the existing methods often learn a classifier using several features that are extracted from eye movements. Although it has been shown that eye movement is potentially useful for recognition of scene valence, the contribution of each feature is not well-studied. To address the issue, we study the contribution of features extracted from eye movements in the classification of images into pleasant, neutral, and unpleasant categories. We assess ten features and their fusion. The features are histogram of saccade orientation, histogram of saccade slope, histogram of saccade length, histogram of saccade duration, histogram of saccade velocity, histogram of fixation duration, fixation histogram, top-ten salient coordinates, and saliency map. We utilize machine learning approach to analyze the performance of features by learning a support vector machine and exploiting various feature fusion schemes. The experiments reveal that ‘saliency map’, ‘fixation histogram’, ‘histogram of fixation duration’, and ‘histogram of saccade slope’ are the most contributing features. The selected features signify the influence of fixation information and angular behavior of eye movements in the recognition of the valence of images

    A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos

    Get PDF
    Although research on detection of saliency and visual attention has been active over recent years, most of the existing work focuses on still image rather than video based saliency. In this paper, a deep learning based hybrid spatiotemporal saliency feature extraction framework is proposed for saliency detection from video footages. The deep learning model is used for the extraction of high-level features from raw video data, and they are then integrated with other high-level features. The deep learning network has been found extremely effective for extracting hidden features than that of conventional handcrafted methodology. The effectiveness for using hybrid high-level features for saliency detection in video is demonstrated in this work. Rather than using only one static image, the proposed deep learning model take several consecutive frames as input and both the spatial and temporal characteristics are considered when computing saliency maps. The efficacy of the proposed hybrid feature framework is evaluated by five databases with human gaze complex scenes. Experimental results show that the proposed model outperforms five other state-of-the-art video saliency detection approaches. In addition, the proposed framework is found useful for other video content based applications such as video highlights. As a result, a large movie clip dataset together with labeled video highlights is generated
    corecore