2,426 research outputs found

    Foveation scalable video coding with automatic fixation selection

    Full text link

    Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events

    Get PDF
    As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us

    Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

    Full text link
    We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features

    Perceptual Visibility Model for Temporal Contrast Changes in Periphery

    Get PDF
    Modeling perception is critical for many applications and developments in computer graphics to optimize and evaluate content generation techniques. Most of the work to date has focused on central (foveal) vision. However, this is insufficient for novel wide-field-of-view display devices, such as virtual and augmented reality headsets. Furthermore, the perceptual models proposed for the fovea do not readily extend to the off-center, peripheral visual field, where human perception is drastically different. In this paper, we focus on modeling the temporal aspect of visual perception in the periphery. We present new psychophysical experiments that measure the sensitivity of human observers to different spatio-temporal stimuli across a wide field of view. We use the collected data to build a perceptual model for the visibility of temporal changes at different eccentricities in complex video content. Finally, we discuss, demonstrate, and evaluate several problems that can be addressed using our technique. First, we show how our model enables injecting new content into the periphery without distracting the viewer, and we discuss the link between the model and human attention. Second, we demonstrate how foveated rendering methods can be evaluated and optimized to limit the visibility of temporal aliasing

    VIDEO PREPROCESSING BASED ON HUMAN PERCEPTION FOR TELESURGERY

    Get PDF
    Video transmission plays a critical role in robotic telesurgery because of the high bandwidth and high quality requirement. The goal of this dissertation is to find a preprocessing method based on human visual perception for telesurgical video, so that when preprocessed image sequences are passed to the video encoder, the bandwidth can be reallocated from non-essential surrounding regions to the region of interest, ensuring excellent image quality of critical regions (e.g. surgical region). It can also be considered as a quality control scheme that will gracefully degrade the video quality in the presence of network congestion. The proposed preprocessing method can be separated into two major parts. First, we propose a time-varying attention map whose value is highest at the gazing point and falls off progressively towards the periphery. Second, we propose adaptive spatial filtering and the parameters of which are adjusted according to the attention map. By adding visual adaptation to the spatial filtering, telesurgical video data can be compressed efficiently because of the high degree of visual redundancy removal by our algorithm. Our experimental results have shown that with the proposed preprocessing method, over half of the bandwidth can be reduced while there is no significant visual effect for the observer. We have also developed an optimal parameter selecting algorithm, so that when the network bandwidth is limited, the overall visual distortion after preprocessing is minimized

    Space-variant picture coding

    Get PDF
    PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    corecore