103 research outputs found
Temporal shape super-resolution by intra-frame motion encoding using high-fps structured light
One of the solutions of depth imaging of moving scene is to project a static
pattern on the object and use just a single image for reconstruction. However,
if the motion of the object is too fast with respect to the exposure time of
the image sensor, patterns on the captured image are blurred and reconstruction
fails. In this paper, we impose multiple projection patterns into each single
captured image to realize temporal super resolution of the depth image
sequences. With our method, multiple patterns are projected onto the object
with higher fps than possible with a camera. In this case, the observed pattern
varies depending on the depth and motion of the object, so we can extract
temporal information of the scene from each single image. The decoding process
is realized using a learning-based approach where no geometric calibration is
needed. Experiments confirm the effectiveness of our method where sequential
shapes are reconstructed from a single image. Both quantitative evaluations and
comparisons with recent techniques were also conducted.Comment: 9 pages, Published at the International Conference on Computer Vision
(ICCV 2017
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System
We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net),
a learning-based technique for disparity computation in mono-camera structured
light systems. In our hardware setting, a static pattern is projected onto a
dynamic scene and captured by a monocular camera. Different from most former
disparity estimation methods that operate in a frame-wise manner, our network
acquires disparity maps in a temporally incremental way. Specifically, We
exploit the deformation of projected patterns (named pattern flow ) on captured
image sequences, to model the temporal information. Notably, this newly
proposed pattern flow formulation reflects the disparity changes along the
epipolar line, which is a special form of optical flow. Tailored for pattern
flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For
each incoming frame, our model fuses correlation volumes (from current frame)
and disparity (from former frame) warped by pattern flow. From fused features,
the final stage of TIDE-Net estimates the residual disparity rather than the
full disparity, as conducted by many previous methods. Interestingly, this
design brings clear empirical advantages in terms of efficiency and
generalization ability. Using only synthetic data for training, our extensitve
evaluation results (w.r.t. both accuracy and efficienty metrics) show superior
performance than several SOTA models on unseen real data. The code is available
on https://github.com/CodePointer/TIDENet
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems
In recent years, deep neural networks have shown remarkable progress in dense
disparity estimation from dynamic scenes in monocular structured light systems.
However, their performance significantly drops when applied in unseen
environments. To address this issue, self-supervised online adaptation has been
proposed as a solution to bridge this performance gap. Unlike traditional
fine-tuning processes, online adaptation performs test-time optimization to
adapt networks to new domains. Therefore, achieving fast convergence during the
adaptation process is critical for attaining satisfactory accuracy. In this
paper, we propose an unsupervised loss function based on long sequential
inputs. It ensures better gradient directions and faster convergence. Our loss
function is designed using a multi-frame pattern flow, which comprises a set of
sparse trajectories of the projected pattern along the sequence. We estimate
the sparse pseudo ground truth with a confidence mask using a filter-based
method, which guides the online adaptation process. Our proposed framework
significantly improves the online adaptation speed and achieves superior
performance on unseen data.Comment: Accpeted by 36th IEEE/RSJ International Conference on Intelligent
Robots and Systems, 202
Error resilience and concealment techniques for high-efficiency video coding
This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods
Spatio-temporal feature representations of reactivated memories
How does the human brain recover memories of past events? The neural processes of memory retrieval are still not fully uncovered. This doctoral thesis is concerned with the spatio-temporal feature representations of reactivated episodic memories. Classical theories and empirical evidence suggest that the revival of memory representations in the brain is initiated in the hippocampus, before activity patterns in cortical regions reactivate to represent previously experienced events. The current doctoral project tests the assumption that the neural processing cascade during retrieval is reversed with respect to perception. This general framework predicts that semantic concepts and modality-independent information is reconstructed before modality-specific sensory details. This backward information flow is also assumed to affect the neural representations when memories are recalled repeatedly, enhancing the integration of new information into existing conceptual networks. The first two studies investigate the neural information flow during retrieval with respect to the reactivated mnemonic representations. First, simultaneous EEG-fMRI is used to track the presumed reversed reconstruction from abstract modality-independent to sensory-specific visual and auditory memory representations. The second EEG-fMRI project then zooms in on the recall of visual memories, testing whether the visual retrieval process propagates backwards along the ventral visual stream transferring from abstract conceptual to detailed perceptual representations. The reverse reconstruction framework predicts that conceptual information, due to its prioritisation, should benefit more from repeated recall than perceptual information. Hence, the last, behavioural study investigated whether retrieval strengthens conceptual representations over perceptual ones and thus promotes the semanticisation of episodic memories. Altogether, the findings offer novel insights into retrieval-related processing cascades, in terms of their temporal and spatial dynamics and the nature of the reactivated representations. The results also provide an understanding of memory transformations during the consolidation processes that are amplified through repeated retrieval
From features to concepts: tracking the neural dynamics of visual perception
The visual system is thought to accomplish categorization through a series of hi-
erarchical feature extraction steps, ending with the formation of high-level cate-
gory representations in occipitotemporal cortex; however, recent evidence has chal-
lenged these assumptions. The experiments described in this thesis address the
question of categorization in face and scene perception using magnetoencephalog-
raphy and multivariate analysis methods.
The first three chapters investigate neural responses to emotional faces from
different perspectives, by varying their relevance to task. First, in a passive view-
ing paradigm, angry faces elicit differential patterns within 100 ms in visual cortex,
consistent with a threat-related bias in feedforward processing. The next chap-
ter looks at rapid face perception in the context of an expression discrimination
task which also manipulates subjective awareness. A neural response to faces, but
not expressions is detected outside awareness. Furthermore, neural patterns and
behavioural responses are shown to reflect both facial features and facial config-
uration. Finally, the third chapter employs emotional faces as distractors during
an orientation discrimination task, but finds no evidence of expression processing
outside of attention.
The fourth chapter focuses on natural scene perception, using a passive view-
ing paradigm to study the contribution of low-level features and high-level cat-
egories to MEG patterns. Multivariate analyses reveal a categorical response to
scenes emerging within 200 ms, despite ongoing processing of low-level features.
Together, these results suggest that feature-based coding of categories, opti-
mized for both stimulus relevance and task demands, underpins dynamic high-
level representations in the visual system. The findings highlight new avenues in
vision research, which may be best pursued by bridging the neural and behavioural
levels within a common computational framework
Scalable Video Streaming with Prioritised Network Coding on End-System Overlays
PhDDistribution over the internet is destined to become a standard approach for live broadcasting
of TV or events of nation-wide interest. The demand for high-quality live video
with personal requirements is destined to grow exponentially over the next few years. Endsystem
multicast is a desirable option for relieving the content server from bandwidth bottlenecks
and computational load by allowing decentralised allocation of resources to the users
and distributed service management. Network coding provides innovative solutions for a
multitude of issues related to multi-user content distribution, such as the coupon-collection
problem, allocation and scheduling procedure. This thesis tackles the problem of streaming
scalable video on end-system multicast overlays with prioritised push-based streaming.
We analyse the characteristic arising from a random coding process as a linear channel
operator, and present a novel error detection and correction system for error-resilient decoding,
providing one of the first practical frameworks for Joint Source-Channel-Network
coding. Our system outperforms both network error correction and traditional FEC coding
when performed separately. We then present a content distribution system based on endsystem
multicast. Our data exchange protocol makes use of network coding as a way to
collaboratively deliver data to several peers. Prioritised streaming is performed by means
of hierarchical network coding and a dynamic chunk selection for optimised rate allocation
based on goodput statistics at application layer. We prove, by simulated experiments, the
efficient allocation of resources for adaptive video delivery. Finally we describe the implementation
of our coding system. We highlighting the use rateless coding properties, discuss
the application in collaborative and distributed coding systems, and provide an optimised
implementation of the decoding algorithm with advanced CPU instructions. We analyse
computational load and packet loss protection via lab tests and simulations, complementing
the overall analysis of the video streaming system in all its components
Quality-aware Content Adaptation in Digital Video Streaming
User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%
Understanding user interactivity for the next-generation immersive communication: design, optimisation, and behavioural analysis
Recent technological advances have opened the gate to a novel way to communicate remotely still feeling connected. In these immersive communications, humans are at the centre of virtual or augmented reality with a full sense of immersion and the possibility to interact with the new environment as well as other humans virtually present. These next-generation communication systems hide a huge potential that can invest in major economic sectors. However, they also posed many new technical challenges, mainly due to the new role of the final user: from merely passive to fully active in requesting and interacting with the content. Thus, we need to go beyond the traditional quality of experience research and develop user-centric solutions, in which the whole multimedia experience is tailored to the final interactive user. With this goal in mind, a better understanding of how people interact with immersive content is needed and it is the focus of this thesis.
In this thesis, we study the behaviour of interactive users in immersive experiences and its impact on the next-generation multimedia systems. The thesis covers a deep literature review on immersive services and user centric solutions, before develop- ing three main research strands. First, we implement novel tools for behavioural analysis of users navigating in a 3-DoF Virtual Reality (VR) system. In detail, we study behavioural similarities among users by proposing a novel clustering algorithm. We also introduce information-theoretic metrics for quantifying similarities for the same viewer across contents. As second direction, we show the impact and advantages of taking into account user behaviour in immersive systems. Specifically, we formulate optimal user centric solutions i) from a server-side perspective and ii) a navigation aware adaptation logic for VR streaming platforms. We conclude by exploiting the aforementioned behavioural studies towards a more in- interactive immersive technology: a 6-DoF VR. Overall in this thesis, experimental results based on real navigation trajectories show key advantages of understanding any hidden patterns of user interactivity to be eventually exploited in engineering user centric solutions for immersive systems
- …