9 research outputs found

    Etude et mise en place d’une plateforme d’adaptation multiservice embarquée pour la gestion de flux multimédia à différents niveaux logiciels et matériels

    Get PDF
    On the one hand, technology advances have led to the expansion of the handheld devices market. Thanks to this expansion, people are more and more connected and more and more data are exchanged over the Internet. On the other hand, this huge amound of data imposes drastic constrains in order to achieve sufficient quality. The Internet is now showing its limits to assure such quality. To answer nowadays limitations, a next generation Internet is envisioned. This new network takes into account the content nature (video, audio, ...) and the context (network state, terminal capabilities ...) to better manage its own resources. To this extend, video manipulation is one of the key concept that is highlighted in this arising context. Video content is more and more consumed and at the same time requires more and more resources. Adapting videos to the network state (reducing its bitrate to match available bandwidth) or to the terminal capabilities (screen size, supported codecs, …) appears mandatory and is foreseen to take place in real time in networking devices such as home gateways. However, video adaptation is a resource intensive task and must be implemented using hardware accelerators to meet the desired low cost and real time constraints.In this thesis, content- and context-awareness is first analyzed to be considered at the network side. Secondly, a generic low cost video adaptation system is proposed and compared to existing solutions as a trade-off between system complexity and quality. Then, hardware conception is tackled as this system is implemented in an FPGA based architecture. Finally, this system is used to evaluate the indirect effects of video adaptation; energy consumption reduction is achieved at the terminal side by reducing video characteristics thus permitting an increased user experience for End-Users.Les avancées technologiques ont permis la commercialisation à grande échelle de terminaux mobiles. De ce fait, l’homme est de plus en plus connecté et partout. Ce nombre grandissant d’usagers du réseau ainsi que la forte croissance du contenu disponible, aussi bien d’un point de vue quantitatif que qualitatif saturent les réseaux et l’augmentation des moyens matériels (passage à la fibre optique) ne suffisent pas. Pour surmonter cela, les réseaux doivent prendre en compte le type de contenu (texte, vidéo, ...) ainsi que le contexte d’utilisation (état du réseau, capacité du terminal, ...) pour assurer une qualité d’expérience optimum. A ce sujet, la vidéo fait partie des contenus les plus critiques. Ce type de contenu est non seulement de plus en plus consommé par les utilisateurs mais est aussi l’un des plus contraignant en terme de ressources nécéssaires à sa distribution (taille serveur, bande passante, …). Adapter un contenu vidéo en fonction de l’état du réseau (ajuster son débit binaire à la bande passante) ou des capacités du terminal (s’assurer que le codec soit nativement supporté) est indispensable. Néanmoins, l’adaptation vidéo est un processus qui nécéssite beaucoup de ressources. Cela est antinomique à son utilisation à grande echelle dans les appareils à bas coûts qui constituent aujourd’hui une grande part dans l’ossature du réseau Internet. Cette thèse se concentre sur la conception d’un système d’adaptation vidéo à bas coût et temps réel qui prendrait place dans ces réseaux du futur. Après une analyse du contexte, un système d’adaptation générique est proposé et évalué en comparaison de l’état de l’art. Ce système est implémenté sur un FPGA afin d’assurer les performances (temps-réels) et la nécessité d’une solution à bas coût. Enfin, une étude sur les effets indirects de l’adaptation vidéo est menée

    Etude et mise en place d'une plateforme d'adaptation multiservice embarquée pour la gestion de flux multimédia à différents niveaux logiciels et matériels

    Get PDF
    Les avancées technologiques ont permis la commercialisation à grande échelle de terminaux mobiles. De ce fait, l homme est de plus en plus connecté et partout. Ce nombre grandissant d usagers du réseau ainsi que la forte croissance du contenu disponible, aussi bien d un point de vue quantitatif que qualitatif saturent les réseaux et l augmentation des moyens matériels (passage à la fibre optique) ne suffisent pas. Pour surmonter cela, les réseaux doivent prendre en compte le type de contenu (texte, vidéo, ...) ainsi que le contexte d utilisation (état du réseau, capacité du terminal, ...) pour assurer une qualité d expérience optimum. A ce sujet, la vidéo fait partie des contenus les plus critiques. Ce type de contenu est non seulement de plus en plus consommé par les utilisateurs mais est aussi l un des plus contraignant en terme de ressources nécéssaires à sa distribution (taille serveur, bande passante, ). Adapter un contenu vidéo en fonction de l état du réseau (ajuster son débit binaire à la bande passante) ou des capacités du terminal (s assurer que le codec soit nativement supporté) est indispensable. Néanmoins, l adaptation vidéo est un processus qui nécéssite beaucoup de ressources. Cela est antinomique à son utilisation à grande echelle dans les appareils à bas coûts qui constituent aujourd hui une grande part dans l ossature du réseau Internet. Cette thèse se concentre sur la conception d un système d adaptation vidéo à bas coût et temps réel qui prendrait place dans ces réseaux du futur. Après une analyse du contexte, un système d adaptation générique est proposé et évalué en comparaison de l état de l art. Ce système est implémenté sur un FPGA afin d assurer les performances (temps-réels) et la nécessité d une solution à bas coût. Enfin, une étude sur les effets indirects de l adaptation vidéo est menée.On the one hand, technology advances have led to the expansion of the handheld devices market. Thanks to this expansion, people are more and more connected and more and more data are exchanged over the Internet. On the other hand, this huge amound of data imposes drastic constrains in order to achieve sufficient quality. The Internet is now showing its limits to assure such quality. To answer nowadays limitations, a next generation Internet is envisioned. This new network takes into account the content nature (video, audio, ...) and the context (network state, terminal capabilities ...) to better manage its own resources. To this extend, video manipulation is one of the key concept that is highlighted in this arising context. Video content is more and more consumed and at the same time requires more and more resources. Adapting videos to the network state (reducing its bitrate to match available bandwidth) or to the terminal capabilities (screen size, supported codecs, ) appears mandatory and is foreseen to take place in real time in networking devices such as home gateways. However, video adaptation is a resource intensive task and must be implemented using hardware accelerators to meet the desired low cost and real time constraints.In this thesis, content- and context-awareness is first analyzed to be considered at the network side. Secondly, a generic low cost video adaptation system is proposed and compared to existing solutions as a trade-off between system complexity and quality. Then, hardware conception is tackled as this system is implemented in an FPGA based architecture. Finally, this system is used to evaluate the indirect effects of video adaptation; energy consumption reduction is achieved at the terminal side by reducing video characteristics thus permitting an increased user experience for End-Users.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Temporal Video Transcoding in Mobile Systems

    Get PDF
    La tesi analizza il problema della transcodifica temporale per la trasmissione del video in tempo reale su reti mobili. Viene proposta un’architettura di transcodifica temporale e un nuovo algoritmo di ricalcolo dei vettori di moto per il transcoder temporale H.264. Per fronteggiare il problema della riduzione costante della banda del canale wireless nelle reti infrastrutturate, vengono proposte diverse politiche di frame skipping basate sul dimensionamento del buffer del transcoder per garantire una comunicazione in tempo reale. Il moto di un frame e il numero di frames consecutivi scartati vengono inoltre considerati per migliorare la qualità del video transcodificato. E’ stato inoltre proposto e studiato un sistema di trasmissione video per reti veicolari con protocollo IEEE 802.11, basato su transcodifica temporale. Questo sistema permette di scartare quei frames il cui tempo di trasmissione supera un massimo ritardo ammisssibile al di sopra del quale tali frames non verrebbero comunque visualizzati. Il sistema proposto permette un notevole risparmio di banda e migliora la qualità del video evitando che molti frames consecutivi vengano scartati a causa della congestione

    Adaptive video delivery using semantics

    Get PDF
    The diffusion of network appliances such as cellular phones, personal digital assistants and hand-held computers has created the need to personalize the way media content is delivered to the end user. Moreover, recent devices, such as digital radio receivers with graphics displays, and new applications, such as intelligent visual surveillance, require novel forms of video analysis for content adaptation and summarization. To cope with these challenges, we propose an automatic method for the extraction of semantics from video, and we present a framework that exploits these semantics in order to provide adaptive video delivery. First, an algorithm that relies on motion information to extract multiple semantic video objects is proposed. The algorithm operates in two stages. In the first stage, a statistical change detector produces the segmentation of moving objects from the background. This process is robust with regard to camera noise and does not need manual tuning along a sequence or for different sequences. In the second stage, feedbacks between an object partition and a region partition are used to track individual objects along the frames. These interactions allow us to cope with multiple, deformable objects, occlusions, splitting, appearance and disappearance of objects, and complex motion. Subsequently, semantics are used to prioritize visual data in order to improve the performance of adaptive video delivery. The idea behind this approach is to organize the content so that a particular network or device does not inhibit the main content message. Specifically, we propose two new video adaptation strategies. The first strategy combines semantic analysis with a traditional frame-based video encoder. Background simplifications resulting from this approach do not penalize overall quality at low bitrates. The second strategy uses metadata to efficiently encode the main content message. The metadata-based representation of object's shape and motion suffices to convey the meaning and action of a scene when the objects are familiar. The impact of different video adaptation strategies is then quantified with subjective experiments. We ask a panel of human observers to rate the quality of adapted video sequences on a normalized scale. From these results, we further derive an objective quality metric, the semantic peak signal-to-noise ratio (SPSNR), that accounts for different image areas and for their relevance to the observer in order to reflect the focus of attention of the human visual system. At last, we determine the adaptation strategy that provides maximum value for the end user by maximizing the SPSNR for given client resources at the time of delivery. By combining semantic video analysis and adaptive delivery, the solution presented in this dissertation permits the distribution of video in complex media environments and supports a large variety of content-based applications

    Digital Watermarking for Verification of Perception-based Integrity of Audio Data

    Get PDF
    In certain application fields digital audio recordings contain sensitive content. Examples are historical archival material in public archives that preserve our cultural heritage, or digital evidence in the context of law enforcement and civil proceedings. Because of the powerful capabilities of modern editing tools for multimedia such material is vulnerable to doctoring of the content and forgery of its origin with malicious intent. Also inadvertent data modification and mistaken origin can be caused by human error. Hence, the credibility and provenience in terms of an unadulterated and genuine state of such audio content and the confidence about its origin are critical factors. To address this issue, this PhD thesis proposes a mechanism for verifying the integrity and authenticity of digital sound recordings. It is designed and implemented to be insensitive to common post-processing operations of the audio data that influence the subjective acoustic perception only marginally (if at all). Examples of such operations include lossy compression that maintains a high sound quality of the audio media, or lossless format conversions. It is the objective to avoid de facto false alarms that would be expectedly observable in standard crypto-based authentication protocols in the presence of these legitimate post-processing. For achieving this, a feasible combination of the techniques of digital watermarking and audio-specific hashing is investigated. At first, a suitable secret-key dependent audio hashing algorithm is developed. It incorporates and enhances so-called audio fingerprinting technology from the state of the art in contentbased audio identification. The presented algorithm (denoted as ”rMAC” message authentication code) allows ”perception-based” verification of integrity. This means classifying integrity breaches as such not before they become audible. As another objective, this rMAC is embedded and stored silently inside the audio media by means of audio watermarking technology. This approach allows maintaining the authentication code across the above-mentioned admissible post-processing operations and making it available for integrity verification at a later date. For this, an existent secret-key ependent audio watermarking algorithm is used and enhanced in this thesis work. To some extent, the dependency of the rMAC and of the watermarking processing from a secret key also allows authenticating the origin of a protected audio. To elaborate on this security aspect, this work also estimates the brute-force efforts of an adversary attacking this combined rMAC-watermarking approach. The experimental results show that the proposed method provides a good distinction and classification performance of authentic versus doctored audio content. It also allows the temporal localization of audible data modification within a protected audio file. The experimental evaluation finally provides recommendations about technical configuration settings of the combined watermarking-hashing approach. Beyond the main topic of perception-based data integrity and data authenticity for audio, this PhD work provides new general findings in the fields of audio fingerprinting and digital watermarking. The main contributions of this PhD were published and presented mainly at conferences about multimedia security. These publications were cited by a number of other authors and hence had some impact on their works

    Personalizing quality aspects for video communication in constrained heterogeneous environments

    Get PDF
    The world of multimedia communication is drastically evolving since a few years. Advanced compression formats for audiovisual information arise, new types of wired and wireless networks are developed, and a broad range of different types of devices capable of multimedia communication appear on the market. The era where multimedia applications available on the Internet were the exclusive domain of PC users has passed. The next generation multimedia applications will be characterized by heterogeneity: differences in terms of the networks, devices and user expectations. This heterogeneity causes some new challenges: transparent consumption of multimedia content is needed in order to be able to reach a broad audience. Recently, two important technologies have appeared that can assist in realizing such transparent Universal Multimedia Access. The first technology consists of new scalable or layered content representation schemes. Such schemes are needed in order to make it possible that a multimedia stream can be consumed by devices with different capabilities and transmitted over network connections with different characteristics. The second technology does not focus on the content representation itself, but rather on linking information about the content, so-called metadata, to the content itself. One of the possible uses of metadata is in the automatic selection and adaptation of multimedia presentations. This is one of the main goals of the MPEG-21 Multimedia Framework. Within the MPEG-21 standard, two formats were developed that can be used for bitstream descriptions. Such descriptions can act as an intermediate layer between a scalable bitstream and the adaptation process. This way, format-independent bitstream adaptation engines can be built. Furthermore, it is straightforward to add metadata information to the bitstream description, and use this information later on during the adaptation process. Because of the efforts spent on bitstream descriptions during our research, a lot of attention is devoted to this topic in this thesis. We describe both frameworks for bitstream descriptions that were standardized by MPEG. Furthermore, we focus on our own contributions in this domain: we developed a number of bitstream schemas and transformation examples for different types of multimedia content. The most important objective of this thesis is to describe a content negotiation process that uses scalable bitstreams in a generic way. In order to be able to express such an application, we felt the need for a better understanding of the data structures, in particular scalable bitstreams, on which this content negotiation process operates. Therefore, this thesis introduces a formal model we developed capable of describing the fundamental concepts of scalable bitstreams and their relations. Apart from the definition of the theoretical model itself, we demonstrate its correctness by applying it to a number of existing formats for scalable bitstreams. Furthermore, we attempt to formulate a content negotiation process as a constrained optimization problem, by means of the notations defined in the abstract model. In certain scenarios, the representation of a content negotiation process as a constrained optimization problem does not sufficiently reflect reality, especially when scalable bitstreams with multiple quality dimensions are involved. In such case, several versions of the same original bitstream can meet all constraints imposed by the system. Sometimes one version clearly offers a better quality towards the end user than another one, but in some cases, it is not possible to objectively compare two versions without additional information. In such a situation, a trade-off will have to be made between the different quality aspects. We use Pareto's theory of multi-criteria optimization for formally describing the characteristics of a content negotiation process for scalable bitstreams with multiple quality dimensions. This way, we can modify our definition of a content negotiation process into a multi-criteria optimization problem. One of the most important problems with multi-criteria optimization problems is that multiple candidate optimal solutions may exist. Additional information, e.g. user preferences, is needed if a single optimal solution has to be selected. Such multi-criteria optimization problems are not new. Unfortunately, existing solutions for selecting one optimal version are not suitable in a content negotiation scenario, because they expect detailed understanding of the problem from the decision maker, in our case the end user. In this thesis, we propose a scenario in which a so-called content negotiation agent would give some sample video sequences to the end user, asking him to select which sequence he liked the most. This information would be used for training the agent: a model would be built representing the preferences of the end user, and this model can be used later on for selecting one solution from a set of candidate optimal solutions. Based on a literature study, we propose two candidate algorithms in this thesis that can be used in such a content negotiation agent. It is possible to use these algorithms for constructing a model of the user's preferences by means of a number of examples, and to use this model when selecting an optimal version. The first algorithm considers the quality of a video sequence as a weighted sum of a number of independent quality aspects, and derives a system of linear inequalities from the example decisions. The second algorithm, called 1ARC, is actually a nearest-neighbor approach, where predictions are made based on the similarity with the example decisions entered by the user. This thesis analyzes the strengths and weaknesses of both algorithms from multiple points of view. The computational complexity of both algorithms is discussed, possible parameters that can influence the reliability of the algorithm, and the reliability itself. For measuring this kind of performance, we set up a test in which human subjects are asked to make a number of pairwise decisions between two versions of the same original video sequence. The reliability of the two algorithms we proposed is tested by selecting a part of these decisions for training a model, and by observing if this model is able to predict other decisions entered by the same user. We not only compare both algorithms, but we also observe the result of modifying several parameters on both algorithms. Ultimately, we conclude that the 1ARC algorithm has an acceptable performance, certainly if the training set is sufficiently large. The reliability is better than what would be theoretically achievable by any other algorithm that selects one optimal version from a set of candidate versions, but does not try to capture the user's preferences. Still, the results that we achieve are not as good as what we initially hoped. One possible cause may be the fact that the algorithms we proposed currently do not take sequence characteristics (e.g. the amount of motion) into account. Other improvements may be possible by means of a more accurate description of the quality aspects that we take into account, in particular the spatial resolution, the amount of distortion and the smoothness of a video sequence. Despite the limitations of the algorithms we proposed, in their performance as well as in their application area, we think that this thesis contains an initial and original contribution to the emerging objective of realizing Quality of Experience in multimedia applications
    corecore