1,650 research outputs found

    An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

    Full text link
    Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are often coupled with the lack of example pairs, which inhibits the application of supervised learning strategies. To address these challenges, we propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate spatio-temporal information across frames and reduces the need for high complexity networks. Our setting enables learning from unpaired videos in a cyclic adversarial manner, where the proposed recurrent units are employed in all architectures. Efficient training is accomplished by introducing one single discriminator that learns the joint distribution of source and target domain simultaneously. The enhancement results demonstrate clear superiority of the proposed video enhancer over the state-of-the-art methods, in all terms of visual quality, quantitative metrics, and inference speed. Notably, our video enhancer is capable of enhancing over 35 frames per second of FullHD video (1080x1920)

    A switchable light field camera architecture with Angle Sensitive Pixels and dictionary-based sparse coding

    Get PDF
    We propose a flexible light field camera architecture that is at the convergence of optics, sensor electronics, and applied mathematics. Through the co-design of a sensor that comprises tailored, Angle Sensitive Pixels and advanced reconstruction algorithms, we show that-contrary to light field cameras today-our system can use the same measurements captured in a single sensor image to recover either a high-resolution 2D image, a low-resolution 4D light field using fast, linear processing, or a high-resolution light field using sparsity-constrained optimization.National Science Foundation (U.S.) (NSF Grant IIS-1218411)National Science Foundation (U.S.) (NSF Grant IIS-1116452)MIT Media Lab ConsortiumNational Science Foundation (U.S.) (NSF Graduate Research Fellowship)Natural Sciences and Engineering Research Council of Canada (NSERC Postdoctoral Fellowship)Alfred P. Sloan Foundation (Research Fellowship)United States. Defense Advanced Research Projects Agency (DARPA Young Faculty Award

    Decoding face categories in diagnostic subregions of primary visual cortex

    Get PDF
    Higher visual areas in the occipitotemporal cortex contain discrete regions for face processing, but it remains unclear if V1 is modulated by top-down influences during face discrimination, and if this is widespread throughout V1 or localized to retinotopic regions processing task-relevant facial features. Employing functional magnetic resonance imaging (fMRI), we mapped the cortical representation of two feature locations that modulate higher visual areas during categorical judgements – the eyes and mouth. Subjects were presented with happy and fearful faces, and we measured the fMRI signal of V1 regions processing the eyes and mouth whilst subjects engaged in gender and expression categorization tasks. In a univariate analysis, we used a region-of-interest-based general linear model approach to reveal changes in activation within these regions as a function of task. We then trained a linear pattern classifier to classify facial expression or gender on the basis of V1 data from ‘eye’ and ‘mouth’ regions, and from the remaining non-diagnostic V1 region. Using multivariate techniques, we show that V1 activity discriminates face categories both in local ‘diagnostic’ and widespread ‘non-diagnostic’ cortical subregions. This indicates that V1 might receive the processed outcome of complex facial feature analysis from other cortical (i.e. fusiform face area, occipital face area) or subcortical areas (amygdala)

    Three-dimensional media for mobile devices

    Get PDF
    Cataloged from PDF version of article.This paper aims at providing an overview of the core technologies enabling the delivery of 3-D Media to next-generation mobile devices. To succeed in the design of the corresponding system, a profound knowledge about the human visual system and the visual cues that form the perception of depth, combined with understanding of the user requirements for designing user experience for mobile 3-D media, are required. These aspects are addressed first and related with the critical parts of the generic system within a novel user-centered research framework. Next-generation mobile devices are characterized through their portable 3-D displays, as those are considered critical for enabling a genuine 3-D experience on mobiles. Quality of 3-D content is emphasized as the most important factor for the adoption of the new technology. Quality is characterized through the most typical, 3-D-specific visual artifacts on portable 3-D displays and through subjective tests addressing the acceptance and satisfaction of different 3-D video representation, coding, and transmission methods. An emphasis is put on 3-D video broadcast over digital video broadcasting-handheld (DVB-H) in order to illustrate the importance of the joint source-channel optimization of 3-D video for its efficient compression and robust transmission over error-prone channels. The comparative results obtained identify the best coding and transmission approaches and enlighten the interaction between video quality and depth perception along with the influence of the context of media use. Finally, the paper speculates on the role and place of 3-D multimedia mobile devices in the future internet continuum involving the users in cocreation and refining of rich 3-D media content
    • …
    corecore