370 research outputs found

    Deep Learning-based Compressed Domain Multimedia for Man and Machine: A Taxonomy and Application to Point Cloud Classification

    Full text link
    In the current golden age of multimedia, human visualization is no longer the single main target, with the final consumer often being a machine which performs some processing or computer vision tasks. In both cases, deep learning plays a undamental role in extracting features from the multimedia representation data, usually producing a compressed representation referred to as latent representation. The increasing development and adoption of deep learning-based solutions in a wide area of multimedia applications have opened an exciting new vision where a common compressed multimedia representation is used for both man and machine. The main benefits of this vision are two-fold: i) improved performance for the computer vision tasks, since the effects of coding artifacts are mitigated; and ii) reduced computational complexity, since prior decoding is not required. This paper proposes the first taxonomy for designing compressed domain computer vision solutions driven by the architecture and weights compatibility with an available spatio-temporal computer vision processor. The potential of the proposed taxonomy is demonstrated for the specific case of point cloud classification by designing novel compressed domain processors using the JPEG Pleno Point Cloud Coding standard under development and adaptations of the PointGrid classifier. Experimental results show that the designed compressed domain point cloud classification solutions can significantly outperform the spatial-temporal domain classification benchmarks when applied to the decompressed data, containing coding artifacts, and even surpass their performance when applied to the original uncompressed data

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes

    Single-pixel imaging 12 years on: a review

    Get PDF
    Modern cameras typically use an array of millions of detector pixels to capture images. By contrast, single-pixel cameras use a sequence of mask patterns to filter the scene along with the corresponding measurements of the transmitted intensity which is recorded using a single-pixel detector. This review considers the development of single-pixel cameras from the seminal work of Duarte et al. up to the present state of the art. We cover the variety of hardware configurations, design of mask patterns and the associated reconstruction algorithms, many of which relate to the field of compressed sensing and, more recently, machine learning. Overall, single-pixel cameras lend themselves to imaging at non-visible wavelengths and with precise timing or depth resolution. We discuss the suitability of single-pixel cameras for different application areas, including infrared imaging and 3D situation awareness for autonomous vehicles

    深層学習に基づく画像圧縮と品質評価

    Get PDF
    早大学位記番号:新8427早稲田大

    Privacy-Friendly Photo Sharing and Relevant Applications Beyond

    Get PDF
    Popularization of online photo sharing brings people great convenience, but has also raised concerns for privacy. Researchers proposed various approaches to enable image privacy, most of which focus on encrypting or distorting image visual content. In this thesis, we investigate novel solutions to protect image privacy with a particular emphasis on online photo sharing. To this end, we propose not only algorithms to protect visual privacy in image content but also design of architectures for privacy-preserving photo sharing. Beyond privacy, we also explore additional impacts and potentials of employing daily images in other three relevant applications. First, we propose and study two image encoding algorithms to protect visual content in image, within a Secure JPEG framework. The first method scrambles a JPEG image by randomly changing the signs of its DCT coefficients based on a secret key. The second method, named JPEG Transmorphing, allows one to protect arbitrary image regions with any obfuscation, while secretly preserving the original image regions in application segments of the obfuscated JPEG image. Performance evaluations reveal a good degree of storage overhead and privacy protection capability for both methods, and particularly a good level of pleasantness for JPEG Transmorphing, if proper manipulations are applied. Second, we investigate the design of two architectures for privacy-preserving photo sharing. The first architecture, named ProShare, is built on a public key infrastructure (PKI) integrated with a ciphertext-policy attribute-based encryption (CP-ABE), to enable the secure and efficient access to user-posted photos protected by Secure JPEG. The second architecture is named ProShare S, in which a photo sharing service provider helps users make photo sharing decisions automatically based on their past decisions using machine learning. The photo sharing service analyzes not only the content of a user's photo, but also context information about the image capture and a prospective requester, and finally makes decision whether or not to share a particular photo to the requester, and if yes, at which granularity. A user study along with extensive evaluations were performed to validate the proposed architecture. In the end, we research into three relevant topics in regard to daily photos captured or shared by people, but beyond their privacy implications. In the first study, inspired by JPEG Transmorphing, we propose an animated JPEG file format, named aJPEG. aJPEG preserves its animation frames as application markers in a JPEG image and provides smaller file size and better image quality than conventional GIF. In the second study, we attempt to understand the impact of popular image manipulations applied in online photo sharing on evoked emotions of observers. The study reveals that image manipulations indeed influence people's emotion, but such impact also depends on the image content. In the last study, we employ a deep convolutional neural network (CNN), the GoogLeNet model, to perform automatic food image detection and categorization. The promising results obtained provide meaningful insights in design of automatic dietary assessment system based on multimedia techniques, e.g. image analysis
    corecore