137 research outputs found

    Massively Parallel Video Networks

    Full text link
    We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and multi-rate clocks, these models perform a minimal amount of computation (e.g. as few as four convolutional layers) for each frame per timestep to produce an output. The models are still very deep, with dozens of such operations being performed but in a pipelined fashion that enables depth-parallel computation. We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation. The results show that a significant degree of parallelism, and implicitly speedup, can be achieved with little loss in performance.Comment: Fixed typos in densenet model definition in appendi

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Full text link
    As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. The relative depth training images are automatically derived from simple videos of cars moving through a scene, using recent motion segmentation techniques, and no human-provided labels. This proxy task of predicting relative depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. The improvement on the semantic segmentation task is greater than those produced by any other automatically supervised methods. Moreover, for monocular depth estimation, our unsupervised pre-training method even outperforms supervised pre-training with ImageNet. In addition, we demonstrate benefits from learning to predict (unsupervised) relative depth in the specific videos associated with various downstream tasks. We adapt to the specific scenes in those tasks in an unsupervised manner to improve performance. In summary, for semantic segmentation, we present state-of-the-art results among methods that do not use supervised pre-training, and we even exceed the performance of supervised ImageNet pre-trained models for monocular depth estimation, achieving results that are comparable with state-of-the-art methods

    Survey on Vision-based Path Prediction

    Full text link
    Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively.Comment: DAPI 201

    Exploring the Fundamental Dynamics of Error-Based Motor Learning Using a Stationary Predictive-Saccade Task

    Get PDF
    The maintenance of movement accuracy uses prior performance errors to correct future motor plans; this motor-learning process ensures that movements remain quick and accurate. The control of predictive saccades, in which anticipatory movements are made to future targets before visual stimulus information becomes available, serves as an ideal paradigm to analyze how the motor system utilizes prior errors to drive movements to a desired goal. Predictive saccades constitute a stationary process (the mean and to a rough approximation the variability of the data do not vary over time, unlike a typical motor adaptation paradigm). This enables us to study inter-trial correlations, both on a trial-by-trial basis and across long blocks of trials. Saccade errors are found to be corrected on a trial-by-trial basis in a direction-specific manner (the next saccade made in the same direction will reflect a correction for errors made on the current saccade). Additionally, there is evidence for a second, modulating process that exhibits long memory. That is, performance information, as measured via inter-trial correlations, is strongly retained across a large number of saccades (about 100 trials). Together, this evidence indicates that the dynamics of motor learning exhibit complexities that must be carefully considered, as they cannot be fully described with current state-space (ARMA) modeling efforts

    Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

    Full text link
    Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.Comment: To appear in ECCV 201

    Measurement of the Bottom-Strange Meson Mixing Phase in the Full CDF Data Set

    Get PDF
    We report a measurement of the bottom-strange meson mixing phase \beta_s using the time evolution of B0_s -> J/\psi (->\mu+\mu-) \phi (-> K+ K-) decays in which the quark-flavor content of the bottom-strange meson is identified at production. This measurement uses the full data set of proton-antiproton collisions at sqrt(s)= 1.96 TeV collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.6 fb-1 of integrated luminosity. We report confidence regions in the two-dimensional space of \beta_s and the B0_s decay-width difference \Delta\Gamma_s, and measure \beta_s in [-\pi/2, -1.51] U [-0.06, 0.30] U [1.26, \pi/2] at the 68% confidence level, in agreement with the standard model expectation. Assuming the standard model value of \beta_s, we also determine \Delta\Gamma_s = 0.068 +- 0.026 (stat) +- 0.009 (syst) ps-1 and the mean B0_s lifetime, \tau_s = 1.528 +- 0.019 (stat) +- 0.009 (syst) ps, which are consistent and competitive with determinations by other experiments.Comment: 8 pages, 2 figures, Phys. Rev. Lett 109, 171802 (2012

    The clinical significance of serum and bronchoalveolar lavage inflammatory cytokines in patients at risk for Acute Respiratory Distress Syndrome

    Get PDF
    BACKGROUND: The predictive role of many cytokines has not been well defined in Acute Respiratory Distress Syndrome (ARDS). METHODS: We measured prospectively IL-4, IL-6, IL-6 receptor, IL-8, and IL-10, in the serum and bronchoalveolar lavage fluid (BALF) in 59 patients who were admitted to ICU in order to identify predictive factors for the course and outcome of ARDS. The patients were divided into three groups: those fulfilling the criteria for ARDS (n = 20, group A), those at risk for ARDS and developed ARDS within 48 hours (n = 12, group B), and those at risk for ARDS but never developed ARDS (n = 27, group C). RESULTS: An excellent negative predictive value for ARDS development was found for IL-6 in BALF and serum (100% and 95%, respectively). IL-8 in BALF and IL-8 and IL-10 serum levels were higher in non-survivors in all studied groups, and were associated with a high negative predictive value. A significant correlation was found between IL-8 and APACHE score (r = 0.60, p < 0.0001). Similarly, IL-6 and IL-6r were highly correlated with PaO2/FiO2 (r = -0.27, p < 0.05 and r = -0.55, p < 0.0001, respectively). CONCLUSIONS: BALF and serum levels of the studied cytokines on admission may provide valuable information for ARDS development in patients at risk, and outcome in patients either in ARDS or in at risk for ARDS
    • …
    corecore