34 research outputs found

    Distributed Video Coding for Multiview and Video-plus-depth Coding

    Get PDF

    Error-resilient multi-view video plus depth based 3-D video coding

    Get PDF
    Three Dimensional (3-D) video, by definition, is a collection of signals that can provide depth perception of a 3-D scene. With the development of 3-D display technologies and interactive multimedia systems, 3-D video has attracted significant interest from both industries and academia with a variety of applications. In order to provide desired services in various 3-D video applications, the multiview video plus depth (MVD) representation, which can facilitate the generation of virtual views, has been determined to be the best format for 3-D video data. Similar to 2-D video, compressed 3-D video is highly sensitive to transmission errors due to errors propagated from the current frame to the future predicted frames. Moreover, since the virtual views required for auto-stereoscopic displays are rendered from the compressed texture videos and depth maps, transmission errors of the distorted texture videos and depth maps can be further propagated to the virtual views. Besides, the distortions in texture and depth show different effects on the rendering views. Therefore, compared to the reliability of the transmission of the 2-D video, error-resilient texture video and depth map coding are facing major new challenges. This research concentrates on improving the error resilience performance of MVD-based 3-D video in packet loss scenarios. Based on the analysis of the propagating behaviour of transmission errors, a Wyner-Ziv (WZ)-based error-resilient algorithm is first designed for coding of the multi-view video data or depth data. In this scheme, an auxiliary redundant stream encoded according to WZ principle is employed to protect a primary stream encoded with standard multi-view video coding codec. Then, considering the fact that different combinations of texture and depth coding mode will exhibit varying robustness to transmission errors, a rate-distortion optimized mode switching scheme is proposed to strike the optimal trade-off between robustness and compression effciency. In this approach, the texture and depth modes are jointly optimized by minimizing the overall distortion of both the coded and synthesized views subject to a given bit rate. Finally, this study extends the research on the reliable transmission of view synthesis prediction (VSP)-based 3-D video. In order to mitigate the prediction position error caused by packet losses in the depth map, a novel disparity vector correction algorithm is developed, where the corrected disparity vector is calculated from the depth error. To facilitate decoder error concealment, the depth error is recursively estimated at the decoder. The contributions of this dissertation are multifold. First, the proposed WZbased error-resilient algorithm can accurately characterize the effect of transmission error on multi-view distortion at the transform domain in consideration of both temporal and inter-view error propagation, and based on the estimated distortion, this algorithm can perform optimal WZ bit allocation at the encoder through explicitly developing a sophisticated rate allocation strategy. This proposed algorithm is able to provide a finer granularity in performing rate adaptivity and unequal error protection for multi-view data, not only at the frame level, but also at the bit-plane level. Secondly, in the proposed mode switching scheme, a new analytic model is formulated to optimally estimate the view synthesis distortion due to packet losses, in which the compound impact of the transmission distortions of both the texture video and the depth map on the quality of the synthesized view is mathematically analysed. The accuracy of this view synthesis distortion model is demonstrated via simulation results and, further, the estimated distortion is integrated into a rate-distortion framework for optimal mode switching to achieve substantial performance gains over state-of-the-art algorithms. Last, but not least, this dissertation provides a preliminary investigation of VSP-based 3-D video over unreliable channel. In the proposed disparity vector correction algorithm, the pixel-level depth map error can be precisely estimated at the decoder without the deterministic knowledge of the error-free reconstructed depth. The approximation of the innovation term involved in depth error estimation is proved theoretically. This algorithm is very useful to conceal the position-erroneous pixels whose disparity vectors are correctly received

    Rate-distortion analysis and traffic modeling of scalable video coders

    Get PDF
    In this work, we focus on two important goals of the transmission of scalable video over the Internet. The first goal is to provide high quality video to end users and the second one is to properly design networks and predict network performance for video transmission based on the characteristics of existing video traffic. Rate-distortion (R-D) based schemes are often applied to improve and stabilize video quality; however, the lack of R-D modeling of scalable coders limits their applications in scalable streaming. Thus, in the first part of this work, we analyze R-D curves of scalable video coders and propose a novel operational R-D model. We evaluate and demonstrate the accuracy of our R-D function in various scalable coders, such as Fine Granular Scalable (FGS) and Progressive FGS coders. Furthermore, due to the time-constraint nature of Internet streaming, we propose another operational R-D model, which is accurate yet with low computational cost, and apply it to streaming applications for quality control purposes. The Internet is a changing environment; however, most quality control approaches only consider constant bit rate (CBR) channels and no specific studies have been conducted for quality control in variable bit rate (VBR) channels. To fill this void, we examine an asymptotically stable congestion control mechanism and combine it with our R-D model to present smooth visual quality to end users under various network conditions. Our second focus in this work concerns the modeling and analysis of video traffic, which is crucial to protocol design and efficient network utilization for video transmission. Although scalable video traffic is expected to be an important source for the Internet, we find that little work has been done on analyzing or modeling it. In this regard, we develop a frame-level hybrid framework for modeling multi-layer VBR video traffic. In the proposed framework, the base layer is modeled using a combination of wavelet and time-domain methods and the enhancement layer is linearly predicted from the base layer using the cross-layer correlation

    Max-Planck-Institute for Psycholinguistics: Annual Report 2001

    No full text

    Investigating the Cognitive and Neural Mechanisms underlying Multisensory Perceptual Decision-Making in Humans

    Get PDF
    On a frequent day-to-day basis, we encounter situations that require the formation of decisions based on ambiguous and often incomplete sensory information. Perceptual decision-making defines the process by which sensory information is consolidated and accumulated towards one of multiple possible choice alternatives, which inform our behavioural responses. Perceptual decision-making can be understood both theoretically and neurologically as a process of stochastic sensory evidence accumulation towards some choice threshold. Once this threshold is exceeded, a response is facilitated, informing the overt actions undertaken. Prevalent progress has been made towards understanding the cognitive and neural mechanisms underlying perceptual decision-making. Analyses of Reaction Time (RTs; typically constrained to milliseconds) and choice accuracy; reflecting decision-making behaviour, can be coupled with neuroimaging methodologies; notably electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI), to identify spatiotemporal components representative of the neural signatures corresponding to such accumulation-to-bound decision formation on a single-trial basis. Taken together, these provide us with an experimental framework conceptualising the key computations underlying perceptual decision-making. Despite this, relatively little remains known about the enhancements or alternations to the process of perceptual decision-making from the integration of information across multiple sensory modalities. Consolidating the available sensory evidence requires processing information presented in more than one sensory modality, often near-simultaneously, to exploit the salient percepts for what we term as multisensory (perceptual) decision-making. Specifically, multisensory integration must be considered within the perceptual decision-making framework in order to understand how information becomes stochastically accumulated to inform overt sensory-motor choice behaviours. Recently, substantial progress in research has been made through the application of behaviourally-informed, and/or neurally-informed, modelling approaches to benefit our understanding of multisensory decision-making. In particular, these approaches fit a number of model parameters to behavioural and/or neuroimaging datasets, in order to (a) dissect the constituent internal cognitive and neural processes underlying perceptual decision-making with both multisensory and unisensory information, and (b) mechanistically infer how multisensory enhancements arise from the integration of information across multiple sensory modalities to benefit perceptual decision formation. Despite this, the spatiotemporal locus of the neural and cognitive underpinnings of enhancements from multisensory integration remains subject to debate. In particular, our understanding of which brain regions are predictive of such enhancements, where they arise, and how they influence decision-making behaviours requires further exploration. The current thesis outlines empirical findings from three studies aimed at providing a more complete characterisation of multisensory perceptual decision-making, utilising EEG and accumulation-to-bound modelling methodologies to incorporate both behaviourally-informed and neurally-informed modelling approaches, investigating where, when, and how perceptual improvements arise during multisensory perceptual decision-making. Pointedly, these modelling approaches sought to probe the exerted modulatory influences of three factors: unisensory formulated cross-modal associations (Chapter 2), natural ageing (Chapter 3), and perceptual learning (Chapter 4), on the integral cognitive and neural mechanisms underlying observable benefits towards multisensory decision formation. Chapter 2 outlines secondary analyses, utilising a neurally-informed modelling approach, characterising the spatiotemporal dynamics of neural activity underlying auditory pitch-visual size cross-modal associations. In particular, how unisensory auditory pitch-driven associations benefit perceptual decision formation was functionally probed. EEG measurements were recorded from participants during performance of an Implicit Association Test (IAT), a two-alternative forced-choice (2AFC) paradigm which presents one unisensory stimulus feature per trial for participants to categorise, but manipulates the stimulus feature-response key mappings of auditory pitch-visual size cross-modal associations from unisensory stimuli alone, thus overcoming the issue of mixed selectivity in recorded neural activity prevalent in previous cross-modal associative research, which near-simultaneously presented multisensory stimuli. Categorisations were faster (i.e., lower RTs) when stimulus feature-response key mappings were associatively congruent, compared to associatively incongruent, between the two associative counterparts, thus demonstrating a behavioural benefit to perceptual decision formation. Multivariate Linear Discriminant Analysis (LDA) was used to characterise the spatiotemporal dynamics of EEG activity underpinning IAT performance, in which two EEG components were identified that discriminated neural activity underlying the benefits of associative congruency of stimulus feature-response key mappings. Application of a neurally-informed Hierarchical Drift Diffusion Model (HDDM) demonstrated early sensory processing benefits, with increases in the duration of non-decisional processes with incongruent stimulus feature-response key mappings, and late post-sensory alterations to decision dynamics, with congruent stimulus feature-response key mappings decreasing the quantity of evidence required to facilitate a decision. Hence, we found that the trial-by-trial variability in perceptual decision formation from unisensory facilitated cross-modal associations could be predicted by neural activity within our neurally-informed modelling approach. Next, Chapter 3 outlines cognitive research investigating age-related impacts on the behavioural indices of multisensory perceptual decision-making (i.e., RTs and choice accuracy). Natural ageing has been demonstrated to diversely affect multisensory perceptual decision-making dynamics. However, the constituent cognitive processes affected remain unclear. Specifically, a mechanistic insight reconciling why older adults may exhibit preserved multisensory integrative benefits, yet display generalised perceptual deficits, relative to younger adults, remains inconclusive. To address this limitation, 212 participants performed an online variant of a well-established audiovisual object categorisation paradigm, whereby age-related differences in RTs and choice accuracy (binary responses) between audiovisual (AV), visual (V), and auditory (A) trial types could be assessed between Younger Adults (YAs; Mean ± Standard Deviation = 27.95 ± 5.82 years) and Older Adults (OAs; Mean ± Standard Deviation = 60.96 ± 10.35 years). Hierarchical Drift Diffusion Modelling (HDDM) was fitted to participants’ RTs and binary responses in order to probe age-related impacts on the latent underlying processes of multisensory decision formation. Behavioural results found that whereas OAs were typically slower (i.e., ↑ RTs) and less accurate (i.e., ↓ choice accuracy), relative to YAs across all sensory trial types, they exhibited greater differences in RTs between AV and V trials (i.e., ↑ AV-V RT difference), with no significant effects of choice accuracy, implicating preserved benefits of multisensory integration towards perceptual decision formation. HDDM demonstrated parsimonious fittings for characterising these behavioural discrepancies between YAs and OAs. Notably we found slower rates of sensory evidence accumulation (i.e., ↓ drift rates) for OAs across all sensory trial types, coupled with (1) higher rates of sensory evidence accumulation (i.e., ↑ drift rates) for OAs between AV versus V trial types irrespective of stimulus difficulty, coupled with (2) increased response caution (i.e., ↑ decision boundaries) between AV versus V trial types, and (3) decreased non-decisional processing duration (i.e., ↓ non-decision times) between AV versus V trial types for stimuli of increased difficulty respectively. Our findings suggest that older adults trade-off multisensory decision-making speed for accuracy to preserve enhancements towards perceptual decision formation relative to younger adults. Hence, they display an increased reliance on integrating multimodal information; through the principle of inverse effectiveness, as a compensatory mechanism for a generalised cognitive slowing when processing unisensory information. Overall, our findings demonstrate how computational modelling can reconcile contrasting hypotheses of age-related changes in processes underlying multisensory perceptual decision-making behaviour. Finally, Chapter 4 outlines research probing the exerted influence of perceptual learning on multisensory perceptual decision-making. Views of unisensory perceptual learning imply that improvements in perceptual sensitivity may be due to enhancements in early sensory representations and/or modulations to post-sensory decision dynamics. We sought to assess whether these views could account for improvements in perceptual sensitivity for multisensory stimuli, or even exacerbations of multisensory enhancements towards decision formation, by consolidating the spatiotemporal locus of where and when in the brain they may be observed. We recorded EEG activity from participants who completed the same audiovisual object categorisation paradigm (as outlined in Chapter 3), over three consecutive days. We used single-trial multivariate LDA to characterise the spatiotemporal trajectory of the decision dynamics underlying any observed multisensory benefits both (a) within and (b) between visual, auditory, and audiovisual trial types. While found significant decreases were found in RTs and increases in choice accuracy over testing days, we did not find any significant effects of perceptual learning on multisensory nor unisensory perceptual decision formation. Similarly, EEG analysis did not find any neural components indicative of early or late modulatory effects from perceptual learning in brain activity, which we attribute to (1) a long duration of stimulus presentations (300ms), and (2) a lack of sufficient statistical power for our LDA classifier to discriminate face-versus-car trial types. We end this chapter with considerations for discerning multisensory benefits towards perceptual decision formation, and recommendations for altering our experimental design to observe the effects of perceptual learning as a decision neuromodulator. These findings contribute to literature justifying the increasing relevance of utilising behaviourally-informed and/or neurally-informed modelling approaches for investigating multisensory perceptual decision-making. In particular, a discussion of the underlying cognitive and/or neural mechanisms that can be attributed to the benefits of multisensory integration towards perceptual decision formation, as well as the modulatory impact of the decision modulators in question, can contribute to a theoretical reconciliation that multisensory integrative benefits are not ubiquitous to specific spatiotemporal neural dynamics nor cognitive processes

    Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?

    Get PDF
    Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering

    Sensorimotor Differences in Autism Spectrum Disorder: An evaluation of potential mechanisms.

    Get PDF
    This thesis examined the aetiology of sensorimotor impairments in Autism Spectrum Disorder: a neurodevelopmental condition that affects an individual’s socio-behavioural preferences, personal independence, and quality of life. Issues relating to clumsiness and movement coordination are common features of autism that contribute to wide-ranging daily living difficulties. However, these characteristics are relatively understudied and there is an absence of evidence-based practical interventions. To pave the way for new, scientifically-focused programmes, a series of studies investigated the mechanistic underpinnings of sensorimotor differences in autism. Following a targeted review of previous research, study one explored links between autistic-like traits and numerous conceptually-significant movement control functions. Eye-tracking analyses were integrated with force transducers and motion capture technology to examine how participants interacted with uncertain lifting objects. Upon identifying a link between autistic-like traits and context-sensitive predictive action control, study two replicated these procedures with a sample of clinically-diagnosed participants. Results illustrated that autistic people are able to use predictions to guide object interactions, but that uncertainty-related adjustments in sensorimotor integration are atypical. Such findings were advanced within a novel virtual-reality paradigm in study three, which systematically manipulated environmental uncertainty during naturalistic interception actions. Here, data supported proposals that precision weighting functions are aberrant in autistic people, and suggested that these individuals have difficulties with processing volatile sensory information. These difficulties were not alleviated by the experimental provision of explicit contextual cues in study four. Together, these studies implicate the role of implicit neuromodulatory mechanisms that regulate dynamic sensorimotor behaviours. Results support the development of evidence-based programmes that ‘make the world more predictable’ for autistic people, with various theoretical and practical implications presented. Possible applications of these findings are discussed in relation to recent multi-disciplinary research and conceptual advances in the field, which could help improve daily living skills and functional quality of life.Economic and Social Research Council (ESRC
    corecore