33 research outputs found
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Image embedding and user multi-preference modeling for data collection sampling
This work proposes an end-to-end user-centric sampling method aimed at selecting the images from an image collection that are able to maximize the information perceived by a given user. As main contributions, we first introduce novel metrics that assess the amount of perceived information retained by the user when experiencing a set of images. Given the actual information present in a set of images, which is the volume spanned by the set in the corresponding latent space, we show how to take into account the user’s preferences in such a volume calculation to build a user-centric metric for the perceived information. Finally, we propose a sampling strategy seeking the minimum set of images that maximize the information perceived by a given user. Experiments using the coco dataset show the ability of the proposed approach to accurately integrate user preference while keeping a reasonable diversity in the sampled image set
Correlation-aware packet scheduling in multi-camera networks
In multiview applications, multiple cameras acquire the same scene from different viewpoints and generally produce correlated video streams. This results in large amounts of highly redundant data. In order to save resources, it is critical to handle properly this correlation during encoding and transmission of the multiview data. In this work, we propose a correlation-aware packet scheduling algorithm for multi-camera networks, where information from all cameras are transmitted over a bottleneck channel to clients that reconstruct the multiview images. The scheduling algorithm relies on a new rate-distortion model that captures the importance of each view in the scene reconstruction. We propose a problem formulation for the optimization of the packet scheduling policies, which adapt to variations in the scene content. Then, we design a low complexity scheduling algorithm based on a trellis search that selects the subset of candidate packets to be transmitted towards effective multiview reconstruction at clients. Extensive simulation results confirm the gain of our scheduling algorithm when inter-source correlation information is used in the scheduler, compared to scheduling policies with no information about the correlation or non-adaptive scheduling policies. We finally show that increasing the optimization horizon in the packet scheduling algorithm improves the transmission performance, especially in scenarios where the level of correlation rapidly varies with time. © 2013 IEEE
Multiview video representations for quality-scalable navigation
Interactive multiview video (IMV) applications offer to users the freedom of selecting their preferred viewpoint. Usually, in these systems texture and depth maps of captured views are available at the user side, as they permit the rendering of intermediate virtual views. However, the virtual views' quality depends on the distance to the available views used as references and on their quality, which is generally constrained by the heterogeneous capabilities of the users. In this context, this work proposes an IMV scalable system, where views are optimally organized in layers, each one offering an incremental improvement in the interactive navigation quality. We propose a distortion model for the rendered virtual views and an algorithm that selects the optimal views' subset per layer. Simulation results show the efficiency of the proposed distortion model, and that the careful choice of reference cameras permits to have a graceful quality degradation for clients with limited capabilities
Rate distortion optimized graph partitioning for omnidirectional image coding
International audienceOmnidirectional images are spherical signals captured by cameras with 360-degree field of view. In order to be compressed using existing encoders, these signals are mapped to planar domain. A commonly used planar representation is the equirectangular one, which corresponds to a non uniform sampling pattern on the spherical surface. This particularity is not explored in traditional image compression schemes, which treat the input signal as a classical perspective image. In this work, we build a graph-based coder adapted to the spherical surface. We build a graph directly on the sphere. Then, to have computationally feasible graph transforms, we propose a rate-distortion optimized graph partitioning algorithm to achieve an effective trade-off between the distortion of the reconstructed signals, the smoothness of the signal on each subgraph, and the cost of coding the graph partitioning description. Experimental results demonstrate that our method outperforms JPEG coding of planar equirectangular images
A differential motion estimation method for image interpolation in distributed video coding
Motion estimation methods based on differential techniques proved to be very useful in the context of video analysis, but have a limited employment in classical video compression because, though accurate, the dense motion vector field they produce requires too much coding resource and computational effort. On the contrary, this kind of algorithm could be useful in the framework of distributed video coding (DVC). In this paper we propose a differential motion estimation algorithm which can run at the decoder in a DVC scheme, without requiring any increase in coding rate. This algorithm allows a performance improvement in image interpolation with respect to state-of-the-art algorithms. ©2009 IEEE
Small angle light scattering investigation of polymerisation induced phase separation mechanisms
International audienceSmall angle light scattering (SALS) is one of the tools that can be used to study a phase separation. It is shown that SALS can be used to discriminate between nucleation and growth (NG) and spinodal decomposition (SD) even when both give a pattern composed of a ring. To support this, a complete calculation of the light scattering of an NG process is performed, taken into account the correct Mie form factor and adding polydispersity, multiple scattering and non-independent scattering. All these factors are shown to play a role that gives patterns that cannot be confused with the one originating from an SD
Side information refinement for long duration GOPs in DVC
Side information generation is a critical step in distributed video coding systems. This is performed by using motion compensated temporal interpolation between two or more key frames (KFs). However, when the temporal distance between key frames increases (i.e. when the GOP size becomes large), the linear interpolation becomes less effective. In a previous work we showed that this problem can be mitigated by using high order interpolation. Now, in the case of long duration GOP, state-of-the-art algorithms propose a hierarchical algorithm for side information generation. By using this procedure, the quality of the central interpolated image in a GOP is consistently worse than images closer to the KFs. In this paper we propose a refinement of the central WZFs by higher order interpolation of the already decoded WZFs, that are closer to the WZF to be estimated. So we reduce the fluctuation of side information quality, with a beneficial impact on final rate-distortion characteristics of the system. The experimental results show an improvement on the SI up to 2.71 dB with respect the state-of-the-art and a global improvement of the PSNR on the decoded frames up to 0.71 dB and a bit rate reduction up to 15 %. ©2010 IEEE
A distributed video coding system for multi view video plus depth
Multi-view video plus depth (MVD) is gathering huge attention, as witnessed by the recent standardization activity, since its rich information about the geometry of the scene allows high-quality synthesis of virtual viewpoints. Distributed video coding of such kind of content is a challenging problem whose solution could enable new services as interactive multi-view streaming. In this work we propose to exploit the geometrical information of the MVD format in order to estimate inter-view occlusions without communication among cameras. Experimental results show a bit rate reduction up to 77% for low bit rate w.r.t. state-of-the-art architectures. © 2013 IEEE