478 research outputs found

    Livrable D4.2 of the PERSEE project : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architecture

    Get PDF
    51Livrable D4.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.2 du projet. Son titre : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architectur

    Gemino: Practical and Robust Neural Compression for Video Conferencing

    Full text link
    Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a A100 GPU, and achieves 2.9x lower bitrate than traditional video codecs for the same perceptual quality.Comment: 12 pages, 6 appendi

    Generalized Rate-Distortion Functions of Videos

    Get PDF
    Customers are consuming enormous digital videos every day via various kinds of video services through terrestrial, cable, and satellite communication systems or over-the-top Internet connections. To offer the best possible services using the limited capacity of video distribution systems, these video services desire precise understanding of the relationship between the perceptual quality of a video and its media attributes, for which we term it the GRD function. In this thesis, we focus on accurately estimating the generalized rate-distortion (GRD) function with a minimal number of measurement queries. We first explore the GRD behavior of compressed digital videos in a two-dimensional space of bitrate and resolution. Our analysis on real-world GRD data reveals that all GRD functions share similar regularities, but meanwhile exhibit considerable variations across different combinations of content and encoder types. Based on the analysis, we define the theoretical space of the GRD function, which not only constructs the groundwork of the form a GRD model should take, but also determines the constraints these functions must satisfy. We propose two computational GRD models. In the first model, we assume that the quality scores are precise, and develop a robust axial-monotonic Clough-Tocher (RAMCT) interpolation method to approximate the GRD function from a moderate number of measurements. In the second model, we show that the GRD function space is a convex set residing in a Hilbert space, and that a GRD function can be estimated by solving a projection problem onto the convex set. By analyzing GRD functions that arise in practice, we approximate the infinite-dimensional theoretical space by a low-dimensional one, based on which an empirical GRD model of few parameters is proposed. To further reduce the number of queries, we present a novel sampling scheme based on a probabilistic model and an information measure. The proposed sampling method generates a sequence of queries by minimizing the overall informativeness of the remaining samples. To evaluate the performance of the GRD estimation methods, we collect a large-scale database consisting of more than 4,0004,000 real-world GRD functions, namely the Waterloo generalized rate-distortion (Waterloo GRD) database. Extensive comparison experiments are carried out on the database. Superiority of the two proposed GRD models over state-of-the-art approaches are attested both quantitatively and visually. Meanwhile, it is also validated that the proposed sampling algorithm consistently reduces the number of queries needed by various GRD estimation algorithms. Finally, we show the broad application scope of the proposed GRD models by exemplifying three applications: rate-distortion curve prediction, per-title encoding profile generation, and video encoder comparison

    Archiving and Delivery of 3DTI Rehabilitation Sessions

    Get PDF
    In this paper we present CyPhy: a cyber-physiotherapy system that brings daily rehabilitation to patient’s home with supervision from trained therapist. With its archiving and delivery features, CyPhy is able to 1) capture and record RGB-D and physiotherapy-related medical sensing data streams in home environment; 2) provide efficient storage for rehabilitation session recordings; 3) provide fast metadata analysis over stored sessions for review recommendation; 4) adaptively deliver rehabilitation session under different networking capabilities; 5) support smooth viewpoint changing during 3D video streaming with scene rendering schemes tailored for devices with different bandwidth and power limitations; and 6) provide platform-independent streaming client for various mobile and PC environments

    Representation and coding of 3D video data

    Get PDF
    Livrable D4.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.1 du projet

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    Efficient Encoding of Wireless Capsule Endoscopy Images Using Direct Compression of Colour Filter Array Images

    Get PDF
    Since its invention in 2001, wireless capsule endoscopy (WCE) has played an important role in the endoscopic examination of the gastrointestinal tract. During this period, WCE has undergone tremendous advances in technology, making it the first-line modality for diseases from bleeding to cancer in the small-bowel. Current research efforts are focused on evolving WCE to include functionality such as drug delivery, biopsy, and active locomotion. For the integration of these functionalities into WCE, two critical prerequisites are the image quality enhancement and the power consumption reduction. An efficient image compression solution is required to retain the highest image quality while reducing the transmission power. The issue is more challenging due to the fact that image sensors in WCE capture images in Bayer Colour filter array (CFA) format. Therefore, standard compression engines provide inferior compression performance. The focus of this thesis is to design an optimized image compression pipeline to encode the capsule endoscopic (CE) image efficiently in CFA format. To this end, this thesis proposes two image compression schemes. First, a lossless image compression algorithm is proposed consisting of an optimum reversible colour transformation, a low complexity prediction model, a corner clipping mechanism and a single context adaptive Golomb-Rice entropy encoder. The derivation of colour transformation that provides the best performance for a given prediction model is considered as an optimization problem. The low complexity prediction model works in raster order fashion and requires no buffer memory. The application of colour transformation yields lower inter-colour correlation and allows the efficient independent encoding of the colour components. The second compression scheme in this thesis is a lossy compression algorithm with a integer discrete cosine transformation at its core. Using the statistics obtained from a large dataset of CE image, an optimum colour transformation is derived using the principal component analysis (PCA). The transformed coefficients are quantized using optimized quantization table, which was designed with a focus to discard medically irrelevant information. A fast demosaicking algorithm is developed to reconstruct the colour image from the lossy CFA image in the decoder. Extensive experiments and comparisons with state-of-the-art lossless image compression methods establish the superiority of the proposed compression methods as simple and efficient image compression algorithm. The lossless algorithm can transmit the image in a lossless manner within the available bandwidth. On the other hand, performance evaluation of lossy compression algorithm indicates that it can deliver high quality images at low transmission power and low computation costs

    Point cloud data compression

    Get PDF
    The rapid growth in the popularity of Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR) experiences have resulted in an exponential surge of three-dimensional data. Point clouds have emerged as a commonly employed representation for capturing and visualizing three-dimensional data in these environments. Consequently, there has been a substantial research effort dedicated to developing efficient compression algorithms for point cloud data. This Master's thesis aims to investigate the current state-of-the-art lossless point cloud geometry compression techniques, explore some of these techniques in more detail and then propose improvements and/or extensions to enhance them and provide directions for future work on this topic
    • …
    corecore