7,933 research outputs found
Recommended from our members
Multimedia delivery in the future internet
The term “Networked Media” implies that all kinds of media including text, image, 3D graphics, audio
and video are produced, distributed, shared, managed and consumed on-line through various networks,
like the Internet, Fiber, WiFi, WiMAX, GPRS, 3G and so on, in a convergent manner [1]. This white
paper is the contribution of the Media Delivery Platform (MDP) cluster and aims to cover the Networked
challenges of the Networked Media in the transition to the Future of the Internet.
Internet has evolved and changed the way we work and live. End users of the Internet have been confronted
with a bewildering range of media, services and applications and of technological innovations concerning
media formats, wireless networks, terminal types and capabilities. And there is little evidence that the pace
of this innovation is slowing. Today, over one billion of users access the Internet on regular basis, more
than 100 million users have downloaded at least one (multi)media file and over 47 millions of them do so
regularly, searching in more than 160 Exabytes1 of content. In the near future these numbers are expected
to exponentially rise. It is expected that the Internet content will be increased by at least a factor of 6, rising
to more than 990 Exabytes before 2012, fuelled mainly by the users themselves. Moreover, it is envisaged
that in a near- to mid-term future, the Internet will provide the means to share and distribute (new)
multimedia content and services with superior quality and striking flexibility, in a trusted and personalized
way, improving citizens’ quality of life, working conditions, edutainment and safety.
In this evolving environment, new transport protocols, new multimedia encoding schemes, cross-layer inthe
network adaptation, machine-to-machine communication (including RFIDs), rich 3D content as well as
community networks and the use of peer-to-peer (P2P) overlays are expected to generate new models of
interaction and cooperation, and be able to support enhanced perceived quality-of-experience (PQoE) and
innovative applications “on the move”, like virtual collaboration environments, personalised services/
media, virtual sport groups, on-line gaming, edutainment. In this context, the interaction with content
combined with interactive/multimedia search capabilities across distributed repositories, opportunistic P2P
networks and the dynamic adaptation to the characteristics of diverse mobile terminals are expected to
contribute towards such a vision.
Based on work that has taken place in a number of EC co-funded projects, in Framework Program 6 (FP6)
and Framework Program 7 (FP7), a group of experts and technology visionaries have voluntarily
contributed in this white paper aiming to describe the status, the state-of-the art, the challenges and the way
ahead in the area of Content Aware media delivery platforms
Photorealistic retrieval of occluded facial information using a performance-driven face model
Facial occlusions can cause both human observers and computer algorithms
to fail in a variety of important tasks such as facial action analysis and
expression classification. This is because the missing information is not
reconstructed accurately enough for the purpose of the task in hand. Most
current computer methods that are used to tackle this problem implement
complex three-dimensional polygonal face models that are generally timeconsuming
to produce and unsuitable for photorealistic reconstruction of
missing facial features and behaviour.
In this thesis, an image-based approach is adopted to solve the occlusion
problem. A dynamic computer model of the face is used to retrieve the
occluded facial information from the driver faces. The model consists of a
set of orthogonal basis actions obtained by application of principal
component analysis (PCA) on image changes and motion fields extracted
from a sequence of natural facial motion (Cowe 2003). Examples of
occlusion affected facial behaviour can then be projected onto the model to
compute coefficients of the basis actions and thus produce photorealistic
performance-driven animations.
Visual inspection shows that the PCA face model recovers aspects of
expressions in those areas occluded in the driver sequence, but the expression is generally muted. To further investigate this finding, a database
of test sequences affected by a considerable set of artificial and natural
occlusions is created. A number of suitable metrics is developed to measure
the accuracy of the reconstructions. Regions of the face that are most
important for performance-driven mimicry and that seem to carry the best
information about global facial configurations are revealed using Bubbles,
thus in effect identifying facial areas that are most sensitive to occlusions.
Recovery of occluded facial information is enhanced by applying an
appropriate scaling factor to the respective coefficients of the basis actions
obtained by PCA. This method improves the reconstruction of the facial
actions emanating from the occluded areas of the face. However, due to the
fact that PCA produces bases that encode composite, correlated actions,
such an enhancement also tends to affect actions in non-occluded areas of
the face. To avoid this, more localised controls for facial actions are
produced using independent component analysis (ICA). Simple projection
of the data onto an ICA model is not viable due to the non-orthogonality of
the extracted bases. Thus occlusion-affected mimicry is first generated using
the PCA model and then enhanced by accordingly manipulating the
independent components that are subsequently extracted from the mimicry.
This combination of methods yields significant improvements and results in
photorealistic reconstructions of occluded facial actions
No-reference depth map quality evaluation model based on depth map edge confidence measurement in immersive video applications
When it comes to evaluating perceptual quality of digital media for overall quality of
experience assessment in immersive video applications, typically two main approaches stand out:
Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the
best representation of perceived video quality assessed by the real viewers. On the other hand, it
consumes a significant amount of time and effort, due to the involvement of real users with lengthy
and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model
is developed. The speed-up advantage offered by an objective quality evaluation model, which can
predict the quality of rendered virtual views based on the depth maps used in the rendering process,
allows for faster quality assessments for immersive video applications. This is particularly
important given the lack of a suitable reference or ground truth for comparing the available depth
maps, especially when live content services are offered in those applications. This paper presents a
no-reference depth map quality evaluation model based on a proposed depth map edge confidence
measurement technique to assist with accurately estimating the quality of rendered (virtual) views
in immersive multi-view video content. The model is applied for depth image-based rendering in
multi-view video format, providing comparable evaluation results to those existing in the literature,
and often exceeding their performance
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
SELF-IMAGE MULTIMEDIA TECHNOLOGIES FOR FEEDFORWARD OBSERVATIONAL LEARNING
This dissertation investigates the development and use of self-images in augmented reality systems for learning and learning-based activities. This work focuses on self- modeling, a particular form of learning, actively employed in various settings for therapy or teaching. In particular, this work aims to develop novel multimedia systems to support the display and rendering of augmented self-images. It aims to use interactivity (via games) as a means of obtaining imagery for use in creating augmented self-images. Two multimedia systems are developed, discussed and analyzed. The proposed systems are validated in terms of their technical innovation and their clinical efficacy in delivering behavioral interventions for young children on the autism spectrum
Use of Depth Perception for the Improved Understanding of Hydrographic Data
This thesis has reviewed how increased depth perception can be used to increase the
understanding of hydrographic data First visual cues and various visual displays and
techniques were investigated. From this investigation 3D stereoscopic techniques prove to
be superior in improving the depth perception and understanding of spatially related data
and a further investigation on current 3D stereoscopic visualisation techniques was carried
out. After reviewing how hydrographic data is currently visualised it was decided that the
chromo stereoscopic visualisation technique is preferred to be used for further research on
selected hydrographic data models. A novel chromo stereoscopic application was
developed and the results from the evaluation on selected hydrographic data models clearly
show an improved depth perception and understanding of the data models
High-Level Synthesis Based VLSI Architectures for Video Coding
High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified
- …