298 research outputs found

    Recent Advances in Region-of-interest Video Coding

    Get PDF

    Livrable D4.2 of the PERSEE project : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architecture

    Get PDF
    51Livrable D4.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.2 du projet. Son titre : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architectur

    Digital Hologram Coding

    Get PDF

    Prioritizing Content of Interest in Multimedia Data Compression

    Get PDF
    Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

    A content based method for perceptually driven joint color/depth compression

    Get PDF
    International audienceMulti-view Video plus Depth (MVD) data refer to a set of conventional color video sequences and an associated set of depth video sequences, all acquired at slightly different viewpoints. This huge amount of data necessitates a reliable compression method. However, there is no standardized compression method for MVD sequences. H.264/MVC compression method, which was standardized for Multi-View-Video representation (MVV), has been the subject of many adaptations to MVD. However, it has been shown that MVC is not well adapted to encode multi-view depth data. We propose a novel option as for compression of MVD data. Its main purpose is to preserve joint color/depth consistency. The originality of the proposed method relies on the use of the decoded color data as a prior for the associated depth compression. This is meant to ensure consistency in both types of data after decoding. Our strategy is motivated by previous studies of artifacts occurring in synthesized views: most annoying distortions are located around strong depth discontinuities and these distortions are due to misalignment of depth and color edges in decoded images. Thus the method is meant to preserve edges and to ensure consistent localization of color edges and depth edges. To ensure compatibility, colored sequences are encoded with H.264. Depth maps compression is based on a 2D still image codec, namely LAR (Locally adapted Resolution). It consists in a quad-tree representation of the images. The quad-tree representation contributes in the preservation of edges in both color and depth data. The adopted strategy is meant to be more perceptually driven than state-of-the-art methods. The proposed approach is compared to H.264 encoding of depth images. Objective metrics scores are similar with H.264 and with the proposed method, and visual quality of synthesized views is improved with the proposed approach

    Multi-camera analysis of soccer sequences

    Get PDF
    The automatic detection of meaningful phases in a soccer game depends on the accurate localization of players and the ball at each moment. However, the automatic analysis of soccer sequences is a challenging task due to the presence of fast moving multiple objects. For this purpose, we present a multi-camera analysis system that yields the position of the ball and players on a common ground plane. The detection in each camera is based on a code-book algorithm and different features are used to classify the detected blobs. The detection results of each camera are transformed using homography to a virtual top-view of the playing field. Within this virtual top-view we merge trajectory information of the different cameras allowing to refine the found positions. In this paper, we evaluate the system on a public SOCCER dataset and end with a discussion of possible improvements of the dataset

    Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

    Get PDF
    When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data

    Real-time moving object segmentation in H.264 compressed domain based on approximate reasoning

    Get PDF
    AbstractThis paper presents a real-time segmentation algorithm to obtain moving objects from the H.264 compressed domain. The proposed segmentation works with very little information and is based on two features of the H.264 compressed video: motion vectors associated to the macroblocks and decision modes. The algorithm uses fuzzy logic and allows to describe position, velocity and size of the detected regions in a comprehensive way, so the proposed approach works with low level information but manages highly comprehensive linguistic concepts. The performance of the algorithm is improved using dynamic design of fuzzy sets that avoids merge and split problems. Experimental results for several traffic scenes demonstrate the real-time performance and the encouraging results in diverse situations

    Surveillance centric coding

    Get PDF
    PhDThe research work presented in this thesis focuses on the development of techniques specific to surveillance videos for efficient video compression with higher processing speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher compression efficiency. The framework of SVC is modified to support Surveillance Centric Coding (SCC). Motion estimation techniques specific to surveillance videos are proposed in order to speed up the compression process of the SCC. The main contributions of the research work presented in this thesis are divided into two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with the Video Content Analysis (VCA) module that detects events of interest in video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatiotemporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is possible; for instance for the transmission purposes. Experimental evaluation showed that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications. In addition to more optimal compression strategy, novel approaches to performing efficient motion estimation specific to surveillance videos are proposed and implemented with experimental results. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Different approaches for selective motion estimation, GOP based, Frame based and Block based, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when a moving object is detected for any frame of the GOP. iii While for the Frame based approach; each frame is tested for the motion activity and consequently for selective motion estimation. The selective motion estimation approach is further explored at a lower level as Block based selective motion estimation. Experimental evaluation showed that significant reduction in computational complexity can be achieved by applying the proposed strategy. In addition to selective motion estimation, a tracker based motion estimation and fast full search using multiple reference frames has been proposed for the surveillance videos. Extensive testing on different surveillance videos shows benefits of application of proposed approaches to achieve the goals of the SCC

    Beyond the pixels: learning and utilising video compression features for localisation of digital tampering.

    Get PDF
    Video compression is pervasive in digital society. With rising usage of deep convolutional neural networks (CNNs) in the fields of computer vision, video analysis and video tampering detection, it is important to investigate how patterns invisible to human eyes may be influencing modern computer vision techniques and how they can be used advantageously. This work thoroughly explores how video compression influences accuracy of CNNs and shows how optimal performance is achieved when compression levels in the training set closely match those of the test set. A novel method is then developed, using CNNs, to derive compression features directly from the pixels of video frames. It is then shown that these features can be readily used to detect inauthentic video content with good accuracy across multiple different video tampering techniques. Moreover, the ability to explain these features allows predictions to be made about their effectiveness against future tampering methods. The problem is motivated with a novel investigation into recent video manipulation methods, which shows that there is a consistent drive to produce convincing, photorealistic, manipulated or synthetic video. Humans, blind to the presence of video tampering, are also blind to the type of tampering. New detection techniques are required and, in order to compensate for human limitations, they should be broadly applicable to multiple tampering types. This thesis details the steps necessary to develop and evaluate such techniques
    corecore