939 research outputs found
Shape representation and coding of visual objets in multimedia applications — An overview
Emerging multimedia applications have created the need for new functionalities in digital communications. Whereas existing compression standards only deal with the audio-visual scene at a frame level, it is now necessary to handle individual objects separately, thus allowing scalable transmission as well as interactive scene recomposition by the receiver. The future MPEG-4 standard aims at providing compression tools addressing these functionalities. Unlike existing frame-based standards, the corresponding coding schemes need to encode shape information explicitly. This paper reviews existing solutions to the problem of shape representation and coding. Region and contour coding techniques are presented and their performance is discussed, considering coding efficiency and rate-distortion control capability, as well as flexibility to application requirements such as progressive transmission, low-delay coding, and error robustnes
Content-Aware Multimedia Communications
The demands for fast, economic and reliable dissemination of multimedia
information are steadily growing within our society. While people and
economy increasingly rely on communication technologies, engineers still
struggle with their growing complexity.
Complexity in multimedia communication originates from several sources. The
most prominent is the unreliability of packet networks like the Internet.
Recent advances in scheduling and error control mechanisms for streaming
protocols have shown that the quality and robustness of multimedia delivery
can be improved significantly when protocols are aware of the content they
deliver. However, the proposed mechanisms require close cooperation between
transport systems and application layers which increases the overall system
complexity. Current approaches also require expensive metrics and focus on
special encoding formats only. A general and efficient model is missing so
far.
This thesis presents efficient and format-independent solutions to support
cross-layer coordination in system architectures. In particular, the first
contribution of this work is a generic dependency model that enables
transport layers to access content-specific properties of media streams,
such as dependencies between data units and their importance. The second
contribution is the design of a programming model for streaming
communication and its implementation as a middleware architecture. The
programming model hides the complexity of protocol stacks behind simple
programming abstractions, but exposes cross-layer control and monitoring
options to application programmers. For example, our interfaces allow
programmers to choose appropriate failure semantics at design time while
they can refine error protection and visibility of low-level errors at
run-time.
Based on some examples we show how our middleware simplifies the
integration of stream-based communication into large-scale application
architectures. An important result of this work is that despite cross-layer
cooperation, neither application nor transport protocol designers
experience an increase in complexity. Application programmers can even
reuse existing streaming protocols which effectively increases system
robustness.Der Bedarf unsere Gesellschaft nach kostengĂĽnstiger und
zuverlässiger
Kommunikation wächst stetig. Während wir uns selbst immer mehr von modernen
Kommunikationstechnologien abhängig machen, müssen die Ingenieure dieser
Technologien sowohl den Bedarf nach schneller EinfĂĽhrung neuer Produkte
befriedigen als auch die wachsende Komplexität der Systeme beherrschen.
Gerade die Ăśbertragung multimedialer Inhalte wie Video und Audiodaten ist
nicht trivial. Einer der prominentesten GrĂĽnde dafĂĽr ist die
Unzuverlässigkeit heutiger Netzwerke, wie z.B.~dem Internet. Paketverluste
und schwankende Laufzeiten können die Darstellungsqualität massiv
beeinträchtigen. Wie jüngste Entwicklungen im Bereich der
Streaming-Protokolle zeigen, sind jedoch Qualität und Robustheit der
Ăśbertragung effizient kontrollierbar, wenn Streamingprotokolle
Informationen ĂĽber den Inhalt der transportierten Daten ausnutzen.
Existierende Ansätze, die den Inhalt von Multimediadatenströmen
beschreiben, sind allerdings meist auf einzelne Kompressionsverfahren
spezialisiert und verwenden berechnungsintensive Metriken. Das reduziert
ihren praktischen Nutzen deutlich. AuĂźerdem erfordert der
Informationsaustausch eine enge Kooperation zwischen Applikationen und
Transportschichten. Da allerdings die Schnittstellen aktueller
Systemarchitekturen nicht darauf vorbereitet sind, mĂĽssen entweder die
Schnittstellen erweitert oder alternative Architekturkonzepte geschaffen
werden. Die Gefahr beider Varianten ist jedoch, dass sich die Komplexität
eines Systems dadurch weiter erhöhen kann.
Das zentrale Ziel dieser Dissertation ist es deshalb,
schichtenĂĽbergreifende Koordination bei gleichzeitiger Reduzierung der
Komplexität zu erreichen. Hier leistet die Arbeit zwei Beträge zum
aktuellen Stand der Forschung. Erstens definiert sie ein universelles
Modell zur Beschreibung von Inhaltsattributen, wie Wichtigkeiten und
Abhängigkeitsbeziehungen innerhalb eines Datenstroms. Transportschichten
können dieses Wissen zur effizienten Fehlerkontrolle verwenden. Zweitens
beschreibt die Arbeit das Noja Programmiermodell fĂĽr multimediale
Middleware. Noja definiert Abstraktionen zur Ăśbertragung und Kontrolle
multimedialer Ströme, die die Koordination von Streamingprotokollen mit
Applikationen ermöglichen. Zum Beispiel können Programmierer geeignete
Fehlersemantiken und Kommunikationstopologien auswählen und den konkreten
Fehlerschutz dann zur Laufzeit verfeinern und kontrolliere
Efficient error control in 3D mesh coding
Our recently proposed wavelet-based L-infinite-constrained coding approach for meshes ensures that the maximum error between the vertex positions in the original and decoded meshes is guaranteed to be lower than a given upper bound. Instantiations of both L-2 and L-infinite coding approaches are demonstrated for MESHGRID, which is a scalable 3D object encoding system, part of MPEG-4 AFX. In this survey paper, we compare the novel L-infinite distortion estimator against the L-2 distortion estimator which is typically employed in 3D mesh coding systems. In addition, we show that, under certain conditions, the L-infinite estimator can be exploited to approximate the Hausdorff distance in real-time implementation
Network streaming and compression for mixed reality tele-immersion
Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor
Scalable and efficient video coding using 3D modeling
In this document we present a 3D model-based video coding scheme for streaming static scene video in a compact way but also enabling time and spatial scalability according to network or terminal capability and providing 3D functionalities. The proposed format is based on encoding the sequence of reconstructed models using second generation wavelets, and efficiently multiplexing the resulting geometric, topological, texture and camera motion binary representations. The wavelets decomposition can be adaptive in order to fit to images and scene contents. To ensure time scalability, this representation is based on a common connectivity for all 3D models, which also allows straightforward morphing between successive models ensuring visual continuity at no additional cost. The method proves to be better than previous methods for video encoding of static scenes, even better than state-of-the-art video coders such as H264 (also known as MPEG AVC). Another application of our approach is the fast transmission and real-time visualization of virtual environments obtained by video capture, for virtual or augmented reality, free walk-through in photo-realistic 3D environments, and numerous other image-base applications. / Nous présentons dans ce document un schéma de codage vidéo basé sur des modèles 3D qui permet de compresser efficacement des vidéos de scènes statiques tout en garantissant une scalabilité temporelle et spatiale afin de s'adapter aux capacités du réseau et des terminaux. Le passage par des modèles 3D permettent d'ajouter des fonctionnalités à la vidéo. Le format proposé se base sur l'encodage d'une séquence de modèles 3D extraits à partir de la vidéo en utilisant des ondelettes de seconde génération, et en multiplexant efficacement les représentations binaires résultaants pour la géométrie, la connectivité, la texture et les positions de caméra. La décomposition par ondelettes peut être aadptative afin de s'adapter au contenu des images et de la scène. Afin d'assurer la scalabilité temporelle, cette représentation et basée sur une connectivité commune pour tous les modèles qui permet de plus uu morphing implicite entre les modèles successifs assurant une continuité visuelle. La méthode a permis d'obtenir de meilleurs résultats pour le codage de vidéos de scènes statiques que le codeur vidéo référence de l'état de l'art H264 (également connu sous le nom de MPEG/AVC). Une autre application de notre approche est la transmission rapide et la visualisation temps réel d'environnements virtuels obtenus partir de vidéos pour les réalités augmentée et virtuelle, la navigation photoréalistique dans des environnements 3D et de nombreuses autres applications basées sur les images
Representation and coding of 3D video data
Livrable D4.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.1 du projet
Geometric Prior Based Deep Human Point Cloud Geometry Compression
The emergence of digital avatars has raised an exponential increase in the
demand for human point clouds with realistic and intricate details. The
compression of such data becomes challenging with overwhelming data amounts
comprising millions of points. Herein, we leverage the human geometric prior in
geometry redundancy removal of point clouds, greatly promoting the compression
performance. More specifically, the prior provides topological constraints as
geometry initialization, allowing adaptive adjustments with a compact parameter
set that could be represented with only a few bits. Therefore, we can envisage
high-resolution human point clouds as a combination of geometric priors and
structural deviations. The priors could first be derived with an aligned point
cloud, and subsequently the difference of features is compressed into a compact
latent code. The proposed framework can operate in a play-and-plug fashion with
existing learning based point cloud compression methods. Extensive experimental
results show that our approach significantly improves the compression
performance without deteriorating the quality, demonstrating its promise in a
variety of applications
- …