137 research outputs found
Massively Parallel Video Networks
We introduce a class of causal video understanding models that aims to
improve efficiency of video processing by maximising throughput, minimising
latency, and reducing the number of clock cycles. Leveraging operation
pipelining and multi-rate clocks, these models perform a minimal amount of
computation (e.g. as few as four convolutional layers) for each frame per
timestep to produce an output. The models are still very deep, with dozens of
such operations being performed but in a pipelined fashion that enables
depth-parallel computation. We illustrate the proposed principles by applying
them to existing image architectures and analyse their behaviour on two video
tasks: action recognition and human keypoint localisation. The results show
that a significant degree of parallelism, and implicitly speedup, can be
achieved with little loss in performance.Comment: Fixed typos in densenet model definition in appendi
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is
(usually) inversely proportional to their depth. It is natural for a learning
agent to associate image patterns with the magnitude of their displacement over
time: as the agent moves, faraway mountains don't move much; nearby trees move
a lot. This natural relationship between the appearance of objects and their
motion is a rich source of information about the world. In this work, we start
by training a deep network, using fully automatic supervision, to predict
relative scene depth from single images. The relative depth training images are
automatically derived from simple videos of cars moving through a scene, using
recent motion segmentation techniques, and no human-provided labels. This proxy
task of predicting relative depth from a single image induces features in the
network that result in large improvements in a set of downstream tasks
including semantic segmentation, joint road segmentation and car detection, and
monocular (absolute) depth estimation, over a network trained from scratch. The
improvement on the semantic segmentation task is greater than those produced by
any other automatically supervised methods. Moreover, for monocular depth
estimation, our unsupervised pre-training method even outperforms supervised
pre-training with ImageNet. In addition, we demonstrate benefits from learning
to predict (unsupervised) relative depth in the specific videos associated with
various downstream tasks. We adapt to the specific scenes in those tasks in an
unsupervised manner to improve performance. In summary, for semantic
segmentation, we present state-of-the-art results among methods that do not use
supervised pre-training, and we even exceed the performance of supervised
ImageNet pre-trained models for monocular depth estimation, achieving results
that are comparable with state-of-the-art methods
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
Exploring the Fundamental Dynamics of Error-Based Motor Learning Using a Stationary Predictive-Saccade Task
The maintenance of movement accuracy uses prior performance errors to correct future motor plans; this motor-learning process ensures that movements remain quick and accurate. The control of predictive saccades, in which anticipatory movements are made to future targets before visual stimulus information becomes available, serves as an ideal paradigm to analyze how the motor system utilizes prior errors to drive movements to a desired goal. Predictive saccades constitute a stationary process (the mean and to a rough approximation the variability of the data do not vary over time, unlike a typical motor adaptation paradigm). This enables us to study inter-trial correlations, both on a trial-by-trial basis and across long blocks of trials. Saccade errors are found to be corrected on a trial-by-trial basis in a direction-specific manner (the next saccade made in the same direction will reflect a correction for errors made on the current saccade). Additionally, there is evidence for a second, modulating process that exhibits long memory. That is, performance information, as measured via inter-trial correlations, is strongly retained across a large number of saccades (about 100 trials). Together, this evidence indicates that the dynamics of motor learning exhibit complexities that must be carefully considered, as they cannot be fully described with current state-space (ARMA) modeling efforts
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
Recently, models based on deep neural networks have dominated the fields of
scene text detection and recognition. In this paper, we investigate the problem
of scene text spotting, which aims at simultaneous text detection and
recognition in natural images. An end-to-end trainable neural network model for
scene text spotting is proposed. The proposed model, named as Mask TextSpotter,
is inspired by the newly published work Mask R-CNN. Different from previous
methods that also accomplish text spotting with end-to-end trainable deep
neural networks, Mask TextSpotter takes advantage of simple and smooth
end-to-end learning procedure, in which precise text detection and recognition
are acquired via semantic segmentation. Moreover, it is superior to previous
methods in handling text instances of irregular shapes, for example, curved
text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the
proposed method achieves state-of-the-art results in both scene text detection
and end-to-end text recognition tasks.Comment: To appear in ECCV 201
Measurement of the Bottom-Strange Meson Mixing Phase in the Full CDF Data Set
We report a measurement of the bottom-strange meson mixing phase \beta_s
using the time evolution of B0_s -> J/\psi (->\mu+\mu-) \phi (-> K+ K-) decays
in which the quark-flavor content of the bottom-strange meson is identified at
production. This measurement uses the full data set of proton-antiproton
collisions at sqrt(s)= 1.96 TeV collected by the Collider Detector experiment
at the Fermilab Tevatron, corresponding to 9.6 fb-1 of integrated luminosity.
We report confidence regions in the two-dimensional space of \beta_s and the
B0_s decay-width difference \Delta\Gamma_s, and measure \beta_s in [-\pi/2,
-1.51] U [-0.06, 0.30] U [1.26, \pi/2] at the 68% confidence level, in
agreement with the standard model expectation. Assuming the standard model
value of \beta_s, we also determine \Delta\Gamma_s = 0.068 +- 0.026 (stat) +-
0.009 (syst) ps-1 and the mean B0_s lifetime, \tau_s = 1.528 +- 0.019 (stat) +-
0.009 (syst) ps, which are consistent and competitive with determinations by
other experiments.Comment: 8 pages, 2 figures, Phys. Rev. Lett 109, 171802 (2012
The clinical significance of serum and bronchoalveolar lavage inflammatory cytokines in patients at risk for Acute Respiratory Distress Syndrome
BACKGROUND: The predictive role of many cytokines has not been well defined in Acute Respiratory Distress Syndrome (ARDS). METHODS: We measured prospectively IL-4, IL-6, IL-6 receptor, IL-8, and IL-10, in the serum and bronchoalveolar lavage fluid (BALF) in 59 patients who were admitted to ICU in order to identify predictive factors for the course and outcome of ARDS. The patients were divided into three groups: those fulfilling the criteria for ARDS (n = 20, group A), those at risk for ARDS and developed ARDS within 48 hours (n = 12, group B), and those at risk for ARDS but never developed ARDS (n = 27, group C). RESULTS: An excellent negative predictive value for ARDS development was found for IL-6 in BALF and serum (100% and 95%, respectively). IL-8 in BALF and IL-8 and IL-10 serum levels were higher in non-survivors in all studied groups, and were associated with a high negative predictive value. A significant correlation was found between IL-8 and APACHE score (r = 0.60, p < 0.0001). Similarly, IL-6 and IL-6r were highly correlated with PaO2/FiO2 (r = -0.27, p < 0.05 and r = -0.55, p < 0.0001, respectively). CONCLUSIONS: BALF and serum levels of the studied cytokines on admission may provide valuable information for ARDS development in patients at risk, and outcome in patients either in ARDS or in at risk for ARDS
Recommended from our members
Strength of baseline inter-trial correlations forecasts adaptive capacity in the vestibulo-ocular reflex
Individual differences in sensorimotor adaptability may permit customized training protocols for optimum learning. Here, we sought to forecast individual adaptive capabilities in the vestibulo-ocular reflex (VOR). Subjects performed 400 head-rotation steps (400 trials) during a baseline test, followed by 20 min of VOR gain adaptation. All subjects exhibited mean baseline VOR gain of approximately 1.0, variable from trial to trial, and showed desired reductions in gain following adaptation with variation in extent across individuals. The extent to which a given subject adapted was inversely proportional to a measure of the strength and duration of baseline inter-trial correlations (β). β is derived from the decay of the autocorrelation of the sequence of VOR gains, and describes how strongly correlated are past gain values; it thus indicates how much the VOR gain on any given trial is informed by performance on previous trials. To maximize the time that images are stabilized on the retina, the VOR should maintain a gain close to 1.0 that is adjusted predominantly according to the most recent error; hence, it is not surprising that individuals who exhibit smaller β (weaker inter-trial correlations) also exhibited the best adaptation. Our finding suggests that the temporal structure of baseline behavioral data contains important information that may aid in forecasting adaptive capacities. This has significant implications for the development of personalized physical therapy protocols for patients, and for other cases when it is necessary to adjust motor programs to maintain movement accuracy in response to pathological and environmental changes
- …