88,541 research outputs found
Video Propagation Networks
We propose a technique that propagates information forward through video
data. The method is conceptually simple and can be applied to tasks that
require the propagation of structured information, such as semantic labels,
based on video content. We propose a 'Video Propagation Network' that processes
video frames in an adaptive manner. The model is applied online: it propagates
information forward without the need to access future frames. In particular we
combine two components, a temporal bilateral network for dense and video
adaptive filtering, followed by a spatial network to refine features and
increased flexibility. We present experiments on video object segmentation and
semantic video segmentation and show increased performance comparing to the
best previous task-specific methods, while having favorable runtime.
Additionally we demonstrate our approach on an example regression task of color
propagation in a grayscale video.Comment: Appearing in Computer Vision and Pattern Recognition, 2017 (CVPR'17
Propagation Networks for Model-Based Control Under Partial Observation
There has been an increasing interest in learning dynamics simulators for
model-based control. Compared with off-the-shelf physics engines, a learnable
simulator can quickly adapt to unseen objects, scenes, and tasks. However,
existing models like interaction networks only work for fully observable
systems; they also only consider pairwise interactions within a single time
step, both restricting their use in practical systems. We introduce Propagation
Networks (PropNet), a differentiable, learnable dynamics model that handles
partially observable scenarios and enables instantaneous propagation of signals
beyond pairwise interactions. Experiments show that our propagation networks
not only outperform current learnable physics engines in forward simulation,
but also achieve superior performance on various control tasks. Compared with
existing model-free deep reinforcement learning algorithms, model-based control
with propagation networks is more accurate, efficient, and generalizable to
new, partially observable scenes and tasks.Comment: Accepted to ICRA 2019. Project Page: http://propnet.csail.mit.edu
Video: https://youtu.be/ZAxHXegkz4
Reliable Video Streaming over mmWave with Multi Connectivity and Network Coding
The next generation of multimedia applications will require the
telecommunication networks to support a higher bitrate than today, in order to
deliver virtual reality and ultra-high quality video content to the users. Most
of the video content will be accessed from mobile devices, prompting the
provision of very high data rates by next generation (5G) cellular networks. A
possible enabler in this regard is communication at mmWave frequencies, given
the vast amount of available spectrum that can be allocated to mobile users;
however, the harsh propagation environment at such high frequencies makes it
hard to provide a reliable service. This paper presents a reliable video
streaming architecture for mmWave networks, based on multi connectivity and
network coding, and evaluates its performance using a novel combination of the
ns-3 mmWave module, real video traces and the network coding library Kodo. The
results show that it is indeed possible to reliably stream video over cellular
mmWave links, while the combination of multi connectivity and network coding
can support high video quality with low latency.Comment: To be presented at the 2018 IEEE International Conference on
Computing, Networking and Communications (ICNC), March 2018, Maui, Hawaii,
USA (invited paper). 6 pages, 4 figure
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
Video captioning in essential is a complex natural process, which is affected
by various uncertainties stemming from video content, subjective judgment, etc.
In this paper we build on the recent progress in using encoder-decoder
framework for video captioning and address what we find to be a critical
deficiency of the existing methods, that most of the decoders propagate
deterministic hidden states. Such complex uncertainty cannot be modeled
efficiently by the deterministic models. In this paper, we propose a generative
approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which
models the uncertainty observed in the data using latent stochastic variables.
Therefore, MS-RNN can improve the performance of video captioning, and generate
multiple sentences to describe a video considering different random factors.
Specifically, a multi-modal LSTM (M-LSTM) is first proposed to interact with
both visual and textual features to capture a high-level representation. Then,
a backward stochastic LSTM (S-LSTM) is proposed to support uncertainty
propagation by introducing latent variables. Experimental results on the
challenging datasets MSVD and MSR-VTT show that our proposed MS-RNN approach
outperforms the state-of-the-art video captioning benchmarks
Neural network-based colonoscopic diagnosis using on-line learning and differential evolution
In this paper, on-line training of neural networks is investigated in the context of computer-assisted colonoscopic diagnosis. A memory-based adaptation of the learning rate for the on-line back-propagation (BP) is proposed and used to seed an on-line evolution process that applies a differential evolution (DE) strategy to (re-) adapt the neural network to modified environmental conditions. Our approach looks at on-line training from the perspective of tracking the changing location of an approximate solution of a pattern-based, and thus, dynamically changing, error function. The proposed hybrid strategy is compared with other standard training methods that have traditionally been used for training neural networks off-line. Results in interpreting colonoscopy images and frames of video sequences are promising and suggest that networks trained with this strategy detect malignant regions of interest with accuracy
- …