25 research outputs found

    Multilayer Bit Allocation for Video Encoding

    Full text link

    Far touch: integrating visual and haptic perceptual processing on wearables

    Get PDF
    The evolution of electronic computers seems to have now reached the ubiquitous realm of wearable computing. Although a vast gamut of systems has been proposed so far, we believe most systems lack proper feedback for the user. In this dissertation, we not only contribute to solving the feedback problem, but we also consider the design of a system to acquire and reproduce the sense of touch. In order for such a system to be feasible, a few important problems need to be considered. Here, we address two of them. First, we know that wireless streaming of high resolution video to a head-mounted display requires high compression ratio. Second, we know that the choice of a proper feedback for the user depends on his/her ability to perceive it confidently across different scenarios. In order to solve the first problem, we propose a new limit that promises theoretically achievable data reduction ratios up to approximately 9:1 with no perceptual loss in typical scenarios. Also, we introduce a novel Gaussian foveation scheme that provides experimentally achievable gains up to approximately 2 times the compression ratio of typical compression schemes with less perceptual loss than in typical transmissions. The background material of both the limit and the foveation scheme includes a proposed pointwise retina-based constraint called pixel efficiency, that can be globally processed to reveal the perceptual efficiency of a display, and can be used together with a lossy parameter to locally control the spatial resolution of a foveated image. In order to solve the second problem, we provide an estimation of difference threshold that suggests that typically humans are able to discriminate between at least 6 different frequencies of an electrotactile stimulation. Also, we propose a novel sequence of experiments that suggests that a change from active touch to passive touch, or from a visual-haptic environment to a haptic environment, typically yields a reduction of the sensitivity index d' and in an increase of the response bias c

    High Dynamic Range Visual Content Compression

    Get PDF
    This thesis addresses the research questions of High Dynamic Range (HDR) visual contents compression. The HDR representations are intended to represent the actual physical value of the light rather than exposed value. The current HDR compression schemes are the extension of legacy Low Dynamic Range (LDR) compressions, by using Tone-Mapping Operators (TMO) to reduce the dynamic range of the HDR contents. However, introducing TMO increases the overall computational complexity, and it causes the temporal artifacts. Furthermore, these compression schemes fail to compress non-salient region differently than the salient region, when Human Visual System (HVS) perceives them differently. The main contribution of this thesis is to propose a novel Mapping-free visual saliency-guided HDR content compression scheme. Firstly, the relationship of Discrete Wavelet Transform (DWT) lifting steps and TMO are explored. A novel approach to compress HDR image by Joint Photographic Experts Group (JPEG) 2000 codec while backward compatible to LDR is proposed. This approach exploits the reversibility of tone mapping and scalability of DWT. Secondly, the importance of the TMO in the HDR compression is evaluated in this thesis. A mapping-free post HDR image compression based on JPEG and JPEG2000 standard codecs for current HDR image formats is proposed. This approach exploits the structure of HDR formats. It has an equivalent compression performance and the lowest computational complexity compared to the existing HDR lossy compressions (50% lower than the state-of-the-art). Finally, the shortcomings of the current HDR visual saliency models, and HDR visual saliency-guided compression are explored in this thesis. A spatial saliency model for HDR visual content outperform others by 10% for spatial visual prediction task with 70% lower computational complexity is proposed. Furthermore, the experiment suggested more than 90% temporal saliency is predicted by the proposed spatial model. Moreover, the proposed saliency model can be used to guide the HDR compression by applying different quantization factor according to the intensity of predicted saliency map

    Visual Saliency in Video Compression and Transmission

    Get PDF
    This dissertation explores the concept of visual saliency—a measure of propensity for drawing visual attention—and presents various novel methods for utilization of visual saliencyin video compression and transmission. Specifically, a computationally-efficient method for visual saliency estimation in digital images and videos is developed, which approximates one of the most well-known visual saliency models. In the context of video compression, a saliency-aware video coding method is proposed within a region-of-interest (ROI) video coding paradigm. The proposed video coding method attempts to reduce attention-grabbing coding artifacts and keep viewers’ attention in areas where the quality is highest. The method allows visual saliency to increase in high quality parts of the frame, and allows saliency to reduce in non-ROI parts. Using this approach, the proposed method is able to achieve the same subjective quality as competing state-of-the-art methods at a lower bit rate. In the context of video transmission, a novel saliency-cognizant error concealment method is presented for ROI-based video streaming in which regions with higher visual saliency are protected more heavily than low saliency regions. In the proposed error concealment method, a low-saliency prior is added to the error concealment process as a regularization term, which serves two purposes. First, it provides additional side information for the decoder to identify the correct replacement blocks for concealment. Second, in the event that a perfectly matched block cannot be unambiguously identified, the low-saliency prior reduces viewers’ visual attention on the loss-stricken regions, resulting in higher overall subjective quality. During the course of this research, an eye-tracking dataset for several standard video sequences was created and made publicly available. This dataset can be utilized to test saliency models for video and evaluate various perceptually-motivated algorithms for video processing and video quality assessment

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Segmentación del movimiento en secuencias de vídeo y su aplicación a la codificación perceptual de vídeo

    Get PDF
    En este Proyecto Fin de Carrera se propone un sistema de segmentación del movimiento en secuencias de vídeo y su aplicación a la codificación de vídeo basada en consideraciones perceptuales, uno de los campos de trabajo más interesantes dentro del marco de la codificación de vídeo. Considerando que la mayoría de secuencias de vídeo cuenta con unas regiones en las que el ojo humano fija su atención (por ejemplo, un objeto que se mueve en un fondo estático), la codificación de estas secuencias se puede llevar a cabo de una manera inteligente asignando más recursos (o bits objetivo) a dichas regiones de interés para que la secuencia decodificada sea subjetivamente más agradable. En función de las principales características del sistema visual humano, que son analizadas en este proyecto, se recurre a un análisis plano a plano del contenido de vídeo para determinar qué regiones son susceptibles de introducir una mayor distorsión de codificación, para que ese ahorro de bits pueda asignarse a las regiones verdaderamente de interés. Estamos hablando, por ejemplo, de la capacidad de segmentar un plano según sus texturas y su cantidad de movimiento. Concretamente, esta propuesta se centra en la cantidad de movimiento característica de áreas del plano para delimitar las regiones de interés. Una vez se dispone de la segmentación correspondiente, la cuantificación llevada a cabo en el codificador de vídeo será incrementada solamente en aquellas partes del plano cuya calidad pueda ser degradada sin repercusiones subjetivas importantes. Por último, en este proyecto se incluye un conjunto de pruebas subjetivas que fueron llevadas a cabo para evaluar dos versiones del sistema propuesto de segmentación del movimiento, una básica y una mejorada, implementadas en el codificador de vídeo H.264/AVC. Un análisis detallado de dichas pruebas así como una serie de propuestas futuras serán expuestos en la parte final de la memoria. ___________________________________________________________________________________________________________This thesis proposes a motion segmentation system in video sequences and its application in video coding based on perceptual considerations, one of the most interesting fields of work of video coding. Most of video sequences has some regions where the human eye focuses its attention (i.e, object moving on static background), the codification of these sequences can be carried out in an intelligent way allocating more resources (target bits) in the regions of interest so that the decoded sequence is subjectively more pleasant. Depends on the main characteristics of human visual system that have been analyzed in this project a video content study is made frame to frame to find the areas most susceptible to introduce more distortion coding, so these save bits can be applied to the regions of interest. All of this is based on the capacity of segment a frame from its texture or movement. Specifically this proposal focuses on amount of movement of frame areas in order to segment the regions of interest. When the segmentation is done, the video coder quantification is increased in areas where the quality can be degraded without important subjective penalties. Finally, this thesis includes a group of subjective tests were realized to evaluate two different versions of the proposed motion segmentation system: a basic version and other one improved, both of them implemented in video coder H.264/AVC. A detailed analysis of these tests and some future proposals are exposed at the end of this document.Ingeniería Técnica en Sonido e Image

    Bidirectional Texture Functions: Acquisition, Rendering and Quality Evaluation

    Get PDF
    As one of its primary objectives, Computer Graphics aims at the simulation of fabrics’ complex reflection behaviour. Characteristic surface reflectance of fabrics, such as highlights, anisotropy or retro-reflection arise the difficulty of synthesizing. This problem can be solved by using Bidirectional Texture Functions (BTFs), a 2D-texture under various light and view direction. But the acquisition of Bidirectional Texture Functions requires an expensive setup and the measurement process is very time-consuming. Moreover, the size of BTF data can range from hundreds of megabytes to several gigabytes, as a large number of high resolution pictures have to be used in any ideal cases. Furthermore, the three-dimensional textured models rendered through BTF rendering method are subject to various types of distortion during acquisition, synthesis, compression, and processing. An appropriate image quality assessment scheme is a useful tool for evaluating image processing algorithms, especially algorithms designed to leave the image visually unchanged. In this contribution, we present and conduct an investigation aimed at locating a robust threshold for downsampling BTF images without loosing perceptual quality. To this end, an experimental study on how decreasing the texture resolution influences perceived quality of the rendered images has been presented and discussed. Next, two basic improvements to the use of BTFs for rendering are presented: firstly, the study addresses the cost of BTF acquisition by introducing a flexible low-cost step motor setup for BTF acquisition allowing to generate a high quality BTF database taken at user-defined arbitrary angles. Secondly, the number of acquired textures to the perceptual quality of renderings is adapted so that the database size is not overloaded and can fit better in memory when rendered. Although visual attention is one of the essential attributes of HVS, it is neglected in most existing quality metrics. In this thesis an appropriate objective quality metric based on extracting visual attention regions from images and adequate investigation of the influence of visual attention on perceived image quality assessment, called Visual Attention Based Image Quality Metric (VABIQM), has been proposed. The novel metric indicates that considering visual saliency can offer significant benefits with regard to constructing objective quality metrics to predict the visible quality differences in images rendered by compressed and non-compressed BTFs and also outperforms straightforward existing image quality metrics at detecting perceivable differences
    corecore