161,699 research outputs found

    Live User-guided Intrinsic Video For Static Scenes

    Get PDF
    We present a novel real-time approach for user-guided intrinsic decomposition of static scenes captured by an RGB-D sensor. In the first step, we acquire a three-dimensional representation of the scene using a dense volumetric reconstruction framework. The obtained reconstruction serves as a proxy to densely fuse reflectance estimates and to store user-provided constraints in three-dimensional space. User constraints, in the form of constant shading and reflectance strokes, can be placed directly on the real-world geometry using an intuitive touch-based interaction metaphor, or using interactive mouse strokes. Fusing the decomposition results and constraints in three-dimensional space allows for robust propagation of this information to novel views by re-projection.We leverage this information to improve on the decomposition quality of existing intrinsic video decomposition techniques by further constraining the ill-posed decomposition problem. In addition to improved decomposition quality, we show a variety of live augmented reality applications such as recoloring of objects, relighting of scenes and editing of material appearance

    Video Frame Differentiation for Streamed Multimedia over Heavilty Loaded IEEE 802.11e WLAN using TXOP

    Get PDF
    In this paper we perform an experimental investigation of using video frame differentiation in conjunction with the TXOP facility to enhance the transmission of parallel multimedia streaming sessions in IEEE 802.11e. The delay constraints associated with the audio and video streams that comprise a multimedia session pose the greatest challenge since real-time multimedia is particularly sensitive to delay as the packets require a strict bounded end-to-end delay. Video streaming applications are considered to be bursty. This burstiness is due to the frame rate of vide., the intrinsic hierarchical structure of the constituent video frame types. The TXOP facility is particularly suited to efficiently deal with this burstiness since it can be used to reserve bandwidth for the duration of the packet burst associated with a packetised video frame. Through experimental investigation, we show that there is a significant performance improvement for video streaming applications under heavily loaded conditions by differentiating between the constituent video frame types. The results shoe that video frame differentiation reduces the mean loss rate by 12% and increases the mean PSNR by 13.1 dB

    Rotation-invariant features for multi-oriented text detection in natural images.

    Get PDF
    Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes

    Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

    Get PDF
    We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. Future frame synthesis is challenging, as it involves low- and high-level image and motion understanding. We propose a novel network structure, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold videos. We also show that our model can be applied to tasks such as visual analogy-making, and present an analysis of the learned network representations.Comment: The first two authors contributed equally to this wor

    Characterization of Zero-Bias Microwave Diode Power Detectors at Cryogenic Temperature

    Full text link
    We present the characterization of commercial tunnel diode low-level microwave power detectors at room and cryogenic temperatures. The sensitivity as well as the output voltage noise of the tunnel diodes are measured as functions of the applied microwave power, the signal frequency being 10 GHz. We highlight strong variations of the diode characteristics when the applied microwave power is higher than few microwatt. For a diode operating at 4{4} K, the differential gain increases from 1,000{1,000} V/W to about 4,500{4,500} V/W when the power passes from −30{-30} dBm to −20{-20} dBm. The diode present a white noise floor equivalent to a NEP of 0.8{0.8} pW/ Hz{\sqrt{\mathrm{Hz}}} and 8{8} pW/Hz{ \sqrt{\mathrm{Hz}}} at 4 K and 300 K respectively. Its flicker noise is equivalent to a relative amplitude noise power spectral density Sα(1 Hz)=−120{S_{\alpha}(1~\mathrm{Hz})=-120}~dB/Hz at 4{4} K. Flicker noise is 10 dB higher at room temperature.Comment: 8 pages and 16 figure

    Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

    Full text link
    We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first two authors contributed equally to this work. Project page: http://visualdynamics.csail.mit.ed
    • …
    corecore