621 research outputs found

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    From Capture to Display: A Survey on Volumetric Video

    Full text link
    Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.Comment: Submitte

    Codage réseau pour des applications multimédias avancées

    Get PDF
    Network coding is a paradigm that allows an efficient use of the capacity of communication networks. It maximizes the throughput in a multi-hop multicast communication and reduces the delay. In this thesis, we focus our attention to the integration of the network coding framework to multimedia applications, and in particular to advanced systems that provide enhanced video services to the users. Our contributions concern several instances of advanced multimedia communications: an efficient framework for transmission of a live stream making joint use of network coding and multiple description coding; a novel transmission strategy for lossy wireless networks that guarantees a trade-off between loss resilience and short delay based on a rate-distortion optimized scheduling of the video frames, that we also extended to the case of interactive multi-view streaming; a distributed social caching system that, using network coding in conjunction with the knowledge of the users' preferences in terms of views, is able to select a replication scheme such that to provide a high video quality by accessing only other members of the social group without incurring the access cost associated with a connection to a central server and without exchanging large tables of metadata to keep track of the replicated parts; and, finally, a study on using blind source separation techniques to reduce the overhead incurred by network coding schemes based on error-detecting techniques such as parity coding and message digest generation. All our contributions are aimed at using network coding to enhance the quality of video transmission in terms of distortion and delay perceivedLe codage rĂ©seau est un paradigme qui permet une utilisation efficace du rĂ©seau. Il maximise le dĂ©bit dans un rĂ©seau multi-saut en multicast et rĂ©duit le retard. Dans cette thĂšse, nous concentrons notre attention sur l’intĂ©gration du codage rĂ©seau aux applications multimĂ©dias, et en particulier aux systĂšmes avancĂšs qui fournissent un service vidĂ©o amĂ©liorĂ© pour les utilisateurs. Nos contributions concernent plusieurs scĂ©narios : un cadre de fonctions efficace pour la transmission de flux en directe qui utilise Ă  la fois le codage rĂ©seau et le codage par description multiple, une nouvelle stratĂ©gie de transmission pour les rĂ©seaux sans fil avec perte qui garantit un compromis entre la rĂ©silience vis-Ă -vis des perte et la reduction du retard sur la base d’une optimisation dĂ©bit-distorsion de l'ordonnancement des images vidĂ©o, que nous avons Ă©galement Ă©tendu au cas du streaming multi-vue interactive, un systĂšme replication sociale distribuĂ©e qui, en utilisant le rĂ©seau codage en relation et la connaissance des prĂ©fĂ©rences des utilisateurs en termes de vue, est en mesure de sĂ©lectionner un schĂ©ma de rĂ©plication capable de fournir une vidĂ©o de haute qualitĂ© en accĂ©dant seulement aux autres membres du groupe social, sans encourir le coĂ»t d’accĂšs associĂ© Ă  une connexion Ă  un serveur central et sans Ă©changer des larges tables de mĂ©tadonnĂ©es pour tenir trace des Ă©lĂ©ments rĂ©pliquĂ©s, et, finalement, une Ă©tude sur l’utilisation de techniques de sĂ©paration aveugle de source -pour rĂ©duire l’overhead encouru par les schĂ©mas de codage rĂ©seau- basĂ© sur des techniques de dĂ©tection d’erreur telles que le codage de paritĂ© et la gĂ©nĂ©ration de message digest

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Reducing Internet Latency : A Survey of Techniques and their Merit

    Get PDF
    Bob Briscoe, Anna Brunstrom, Andreas Petlund, David Hayes, David Ros, Ing-Jyh Tsang, Stein Gjessing, Gorry Fairhurst, Carsten Griwodz, Michael WelzlPeer reviewedPreprin

    DĂŒnaamiline kiiruse jaotamine interaktiivses mitmevaatelises video vaatevahetuse ennustamineses

    Get PDF
    In Interactive Multi-View Video (IMVV), the video has been captured by numbers of cameras positioned in array and transmitted those camera views to users. The user can interact with the transmitted video content by choosing viewpoints (views from different cameras in the array) with the expectation of minimum transmission delay while changing among various views. View switching delay is one of the primary concern that is dealt in this thesis work, where the contribution is to minimize the transmission delay of new view switch frame through a novel process of selection of the predicted view and compression considering the transmission efficiency. Mainly considered a realtime IMVV streaming, and the view switch is mapped as discrete Markov chain, where the transition probability is derived using Zipf distribution, which provides information regarding view switch prediction. To eliminate Round-Trip Time (RTT) transmission delay, Quantization Parameters (QP) are adaptively allocated to the remaining redundant transmitted frames to maintain view switching time minimum, trading off with the quality of the video till RTT time-span. The experimental results of the proposed method show superior performance on PSNR and view switching delay for better viewing quality over the existing methods
    • 

    corecore