11 research outputs found
Low-complexity and high-quality frame-skipping transcoder for continuous presence multipoint video conferencing
2003-2004 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
Dynamic region of interest transcoding for multipoint video conferencing
This paper presents a region of interest transcoding scheme for multipoint video conferencing to enhance the visual quality. In a multipoint videoconference, usually there are only one or two active conferees at one time which are the regions of interest to the other conferees involved. We propose a Dynamic Sub-Window Skipping (DSWS) scheme to firstly identify the active participants from the multiple incoming encoded video streams by calculating the motion activity of each sub-window, and secondly reduce the frame-rates of the motion inactive participants by skipping these less-important subwindows. The bits saved by the skipping operation are reallocated to the active sub-windows to enhance the regions of interest. We also propose a low-complexity scheme to compose and trace the unavailable motion vectors with a good accuracy in the dropped inactive sub-windows after performing the DSWS. Simulation results show that the proposed methods not only significantly improve the visual quality on the active subwindows without introducing serious visual quality degradation in the inactive ones, but also reduce the computational complexity and avoid whole-frame skipping. Moreover, the proposed algorithm is fully compatible with the H.263 video coding standard. 1
On the architecture of H.264 to H.264 homogeneous transcoding platform
2007-2008 > Academic research: refereed > Invited conference paperVersion of RecordPublishe
Temporal Video Transcoding in Mobile Systems
La tesi analizza il problema della transcodifica temporale per la trasmissione del video in tempo reale su reti mobili. Viene proposta un’architettura di transcodifica temporale e un nuovo algoritmo di ricalcolo dei vettori di moto per il transcoder temporale H.264. Per fronteggiare il problema della riduzione costante della banda del canale wireless nelle reti infrastrutturate, vengono proposte diverse politiche di frame skipping basate sul dimensionamento del buffer del transcoder per garantire una comunicazione in tempo reale. Il moto di un frame e il numero di frames consecutivi scartati vengono inoltre considerati per migliorare la qualità del video transcodificato. E’ stato inoltre proposto e studiato un sistema di trasmissione video per reti veicolari con protocollo IEEE 802.11, basato su transcodifica temporale. Questo sistema permette di scartare quei frames il cui tempo di trasmissione supera un massimo ritardo ammisssibile al di sopra del quale tali frames non verrebbero comunque visualizzati. Il sistema proposto permette un notevole risparmio di banda e migliora la qualità del video evitando che molti frames consecutivi vengano scartati a causa della congestione
QoS framework for video streaming in home networks
In this thesis we present a new SNR scalable video coding scheme. An important advantage of the proposed scheme is that it requires just a standard video decoder for processing each layer. The quality of the delivered video depends on the allocation of bit rates to the base and enhancement layers. For a given total bit rate, the combination with a bigger base layer delivers higher quality. The absence of dependencies between frames in enhancement layers makes the system resilient to losses of arbitrary frames from an enhancement layer. Furthermore, that property can be used in a more controlled fashion. An important characteristic of any video streaming scheme is the ability to handle network bandwidth fluctuations. We made a streaming technique that observes the network conditions and based on the observations reconfigures the layer configuration in order to achieve the best possible quality. A change of the network conditions forces a change in the number of layers or the bit rate of these layers. Knowledge of the network conditions allows delivery of a video of higher quality by choosing an optimal layer configuration. When the network degrades, the amount of data transmitted per second is decreased by skipping frames from an enhancement layer on the sender side. The presented video coding scheme allows skipping any frame from an enhancement layer, thus enabling an efficient real-time control over transmission at the network level and fine-grained control over the decoding of video data. The methodology proposed is not MPEG-2 specific and can be applied to other coding standards. We made a terminal resource manager that enables trade-offs between quality and resource consumption due to the use of scalable video coding in combination with scalable video algorithms. The controller developed for the decoding process optimizes the perceived quality with respect to the CPU power available and the amount of input data. The controller does not depend on the type of scalability technique and can therefore be used with any scalable video. The controller uses the strategy that is created offline by means of a Markov Decision Process. During the evaluation it was found that the correctness of the controller behavior depends on the correctness of parameter settings for MDP, so user tests should be employed to find the optimal settings
Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window
This paper proposes an enhanced discrete-time Markov chain algorithm in predicting dominant speaker(s) for multipoint video communication system in the presence of transient speech. The proposed algorithm exploits statistical properties of the past speech patterns to accurately predict the dominant speaker for the next time state. Non-linear weights-based coefficients are employed in the enhanced Markov chain for both the initial state vector and transition probability matrix. These weights significantly improve the time taken to predict a new dominant speaker during a conference session. In addition, a mechanism to dynamically modify the size of the transition probability matrix window/container is introduced to improve the adaptability of the Markov chain towards the variability of speech characteristics. Simulation results indicate that for an 11 conference participants test scenario, the enhanced Markov chain prediction algorithm registered an 85% accuracy in predicting a dominant speaker when compared to an ideal case where there is no transient speech. Misclassification of dominant speakers due to transient speech was also reduced by 87%
Error Resilience in Heterogeneous Visual Communications
A critical and challenging aspect of visual communication technologies is to immunize visual information to transmission errors. In order to effectively protect visual content against transmission errors, various kinds of heterogeneities involved in multimedia delivery need to be considered, such as compressed stream characteristics heterogeneity, channel condition heterogeneity, multi-user and multi-hop heterogeneity. The main theme of this dissertation is to explore these heterogeneities involved in error-resilient visual communications to deliver different visual content over heterogeneous networks with good visual quality.
Concurrently transmitting multiple video streams in error-prone environment faces many challenges, such as video content characteristics are heterogeneous, transmission bandwidth is limited, and the user device capabilities vary. These challenges prompt the need for an integrated approach of error protection and resource allocation. One motivation of this dissertation is to develop such an integrated approach for an emerging application of multi-stream video aggregation, i.e. multi-point video conferencing. We propose a distributed multi-point video conferencing system that employs packet division multiplexing access (PDMA)-based error protection and resource allocation, and explore the multi-hop awareness to deliver good and fair visual quality of video streams to end users.
When the transport layer mechanism, such as forward error correction (FEC), cannot provide sufficient error protection on the payload stream, the unrecovered transmission errors may lead to visual distortions at the decoder. In order to mitigate the visual distortions caused by the unrecovered errors, concealment techniques
can be applied at the decoder to provide an approximation of the original content. Due to image characteristics heterogeneity, different concealment approaches are necessary to accommodate different nature of the lost image content. We address this heterogeneity issue and propose to apply a classification framework that adaptively selects the suitable error concealment technique for each damaged image area.
The analysis and extensive experimental results in this dissertation demonstrate that the proposed integrated approach of FEC and resource allocation as well as the new
classification-based error concealment approach can significantly outperform conventional
error-resilient approaches
Incrustation d'un logo dans un ficher vidéo codé avec le standard MPEG-2
Ce mémoire constitue l'aboutissement du projet de recherche de Patrick Keroulas et aborde la notion de compression vidéo, domaine en pleine ébullition avec la démocratisation de l'équipement vidéo et des réseaux de télécommunication. La question initiale est de savoir s'il est possible de modifier le contenu de l'image directement dans un flux binaire provenant d'une séquence vidéo compressée. Un tel dispositif permettrait d'ajouter des modifications en n'importe quel point d'un réseau en évitant le décodage et recodage du flux de données, ces deux processus étant très coûteux en termes de calcul. Brièvement présentés dans la première partie, plusieurs travaux ont déjà proposé une gamme assez large de méthodes de filtrage, de débruitage, de redimensionnement de l'image, etc. Toutes les publications rencontrées à ce sujet se concentrent sur la transposition des traitements de l'image du domaine spatial vers le domaine fréquentiel. Il a été convenu de centrer la problématique sur une application potentiellement exploitable dans le domaine de la télédiffusion. Il s'agit d'incruster un logo ajustable en position et en opacité dans un fichier vidéo codé avec la norme MPEG-2, encore couramment utilisée. La transformée appliquée par cet algorithme de compression est la DCT (Discrete Cosine Transform). Un article publié en 1995 traitant de la composition vidéo en général est plus détaillé car il sert de base à cette étude. Certains outils proposés qui reposent sur la linéarité et l'orthogonalité de la transformée seront repris dans le cadre de ce projet, mais la démarche proposée pour résoudre les problèmes temporels est différente. Ensuite, les éléments essentiels de la norme MPEG-2 sont présentés pour en comprendre les mécanismes et également pour exposer la structure d'un fichier codé car, en pratique, ce serait la seule donnée accessible. Le quatrième chapitre de l'étude présente la solution technique mise en oeuvre via un article soumis à IEEE Transactions on Broadcasting. C'est dans cette partie que toutes les subtilités liées au codage sont traitées : la structure en blocs de pixel, la prédiction spatiale, la compensation de mouvement au demi-pixel près, la nécessité ou non de la quantification inverse. À la vue des résultats satisfaisants, la discussion finale porte sur la limite du système : le compromis entre son efficacité, ses degrés de liberté et le degré de décodage du flux
Content-Aware Multimedia Communications
The demands for fast, economic and reliable dissemination of multimedia
information are steadily growing within our society. While people and
economy increasingly rely on communication technologies, engineers still
struggle with their growing complexity.
Complexity in multimedia communication originates from several sources. The
most prominent is the unreliability of packet networks like the Internet.
Recent advances in scheduling and error control mechanisms for streaming
protocols have shown that the quality and robustness of multimedia delivery
can be improved significantly when protocols are aware of the content they
deliver. However, the proposed mechanisms require close cooperation between
transport systems and application layers which increases the overall system
complexity. Current approaches also require expensive metrics and focus on
special encoding formats only. A general and efficient model is missing so
far.
This thesis presents efficient and format-independent solutions to support
cross-layer coordination in system architectures. In particular, the first
contribution of this work is a generic dependency model that enables
transport layers to access content-specific properties of media streams,
such as dependencies between data units and their importance. The second
contribution is the design of a programming model for streaming
communication and its implementation as a middleware architecture. The
programming model hides the complexity of protocol stacks behind simple
programming abstractions, but exposes cross-layer control and monitoring
options to application programmers. For example, our interfaces allow
programmers to choose appropriate failure semantics at design time while
they can refine error protection and visibility of low-level errors at
run-time.
Based on some examples we show how our middleware simplifies the
integration of stream-based communication into large-scale application
architectures. An important result of this work is that despite cross-layer
cooperation, neither application nor transport protocol designers
experience an increase in complexity. Application programmers can even
reuse existing streaming protocols which effectively increases system
robustness.Der Bedarf unsere Gesellschaft nach kostengünstiger und
zuverlässiger
Kommunikation wächst stetig. Während wir uns selbst immer mehr von modernen
Kommunikationstechnologien abhängig machen, müssen die Ingenieure dieser
Technologien sowohl den Bedarf nach schneller Einführung neuer Produkte
befriedigen als auch die wachsende Komplexität der Systeme beherrschen.
Gerade die Übertragung multimedialer Inhalte wie Video und Audiodaten ist
nicht trivial. Einer der prominentesten Gründe dafür ist die
Unzuverlässigkeit heutiger Netzwerke, wie z.B.~dem Internet. Paketverluste
und schwankende Laufzeiten können die Darstellungsqualität massiv
beeinträchtigen. Wie jüngste Entwicklungen im Bereich der
Streaming-Protokolle zeigen, sind jedoch Qualität und Robustheit der
Übertragung effizient kontrollierbar, wenn Streamingprotokolle
Informationen über den Inhalt der transportierten Daten ausnutzen.
Existierende Ansätze, die den Inhalt von Multimediadatenströmen
beschreiben, sind allerdings meist auf einzelne Kompressionsverfahren
spezialisiert und verwenden berechnungsintensive Metriken. Das reduziert
ihren praktischen Nutzen deutlich. Außerdem erfordert der
Informationsaustausch eine enge Kooperation zwischen Applikationen und
Transportschichten. Da allerdings die Schnittstellen aktueller
Systemarchitekturen nicht darauf vorbereitet sind, müssen entweder die
Schnittstellen erweitert oder alternative Architekturkonzepte geschaffen
werden. Die Gefahr beider Varianten ist jedoch, dass sich die Komplexität
eines Systems dadurch weiter erhöhen kann.
Das zentrale Ziel dieser Dissertation ist es deshalb,
schichtenübergreifende Koordination bei gleichzeitiger Reduzierung der
Komplexität zu erreichen. Hier leistet die Arbeit zwei Beträge zum
aktuellen Stand der Forschung. Erstens definiert sie ein universelles
Modell zur Beschreibung von Inhaltsattributen, wie Wichtigkeiten und
Abhängigkeitsbeziehungen innerhalb eines Datenstroms. Transportschichten
können dieses Wissen zur effizienten Fehlerkontrolle verwenden. Zweitens
beschreibt die Arbeit das Noja Programmiermodell für multimediale
Middleware. Noja definiert Abstraktionen zur Übertragung und Kontrolle
multimedialer Ströme, die die Koordination von Streamingprotokollen mit
Applikationen ermöglichen. Zum Beispiel können Programmierer geeignete
Fehlersemantiken und Kommunikationstopologien auswählen und den konkreten
Fehlerschutz dann zur Laufzeit verfeinern und kontrolliere