836 research outputs found

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Regular and stochastic behavior of Parkinsonian pathological tremor signals

    Full text link
    Regular and stochastic behavior in the time series of Parkinsonian pathological tremor velocity is studied on the basis of the statistical theory of discrete non-Markov stochastic processes and flicker-noise spectroscopy. We have developed a new method of analyzing and diagnosing Parkinson's disease (PD) by taking into consideration discreteness, fluctuations, long- and short-range correlations, regular and stochastic behavior, Markov and non-Markov effects and dynamic alternation of relaxation modes in the initial time signals. The spectrum of the statistical non-Markovity parameter reflects Markovity and non-Markovity in the initial time series of tremor. The relaxation and kinetic parameters used in the method allow us to estimate the relaxation scales of diverse scenarios of the time signals produced by the patient in various dynamic states. The local time behavior of the initial time correlation function and the first point of the non-Markovity parameter give detailed information about the variation of pathological tremor in the local regions of the time series. The obtained results can be used to find the most effective method of reducing or suppressing pathological tremor in each individual case of a PD patient. Generally, the method allows one to assess the efficacy of the medical treatment for a group of PD patients.Comment: 39 pages, 10 figures, 1 table Physica A, in pres

    Hand tracking and bimanual movement understanding

    Get PDF
    Bimanual movements are a subset ot human movements in which the two hands move together in order to do a task or imply a meaning A bimanual movement appearing in a sequence of images must be understood in order to enable computers to interact with humans in a natural way This problem includes two main phases, hand tracking and movement recognition. We approach the problem of hand tracking from a neuroscience point ot view First the hands are extracted and labelled by colour detection and blob analysis algorithms In the presence of the two hands one hand may occlude the other occasionally Therefore, hand occlusions must be detected in an image sequence A dynamic model is proposed to model the movement of each hand separately Using this model in a Kalman filtering proccss the exact starting and end points of hand occlusions are detected We exploit neuroscience phenomena to understand the beha\ tour of the hands during occlusion periods Based on this, we propose a general hand tracking algorithm to track and reacquire the hands over a movement including hand occlusion The advantages of the algorithm and its generality are demonstrated in the experiments. In order to recognise the movements first we recognise the movement of a hand Using statistical pattern recognition methods (such as Principal Component Analysis and Nearest Neighbour) the static shape of each hand appearing in an image is recognised A Graph- Matching algorithm and Discrete Midden Markov Models (DHMM) as two spatio-temporal pattern recognition techniques are imestigated tor recognising a dynamic hand gesture For recognising bimanual movements we consider two general forms ot these movements, single and concatenated periodic We introduce three Bayesian networks for recognising die movements The networks are designed to recognise and combinc the gestures of the hands in order to understand the whole movement Experiments on different types ot movement demonstrate the advantages and disadvantages of each network

    On the Inverse Problem of Binocular 3D Motion Perception

    Get PDF
    It is shown that existing processing schemes of 3D motion perception such as interocular velocity difference, changing disparity over time, as well as joint encoding of motion and disparity, do not offer a general solution to the inverse optics problem of local binocular 3D motion. Instead we suggest that local velocity constraints in combination with binocular disparity and other depth cues provide a more flexible framework for the solution of the inverse problem. In the context of the aperture problem we derive predictions from two plausible default strategies: (1) the vector normal prefers slow motion in 3D whereas (2) the cyclopean average is based on slow motion in 2D. Predicting perceived motion directions for ambiguous line motion provides an opportunity to distinguish between these strategies of 3D motion processing. Our theoretical results suggest that velocity constraints and disparity from feature tracking are needed to solve the inverse problem of 3D motion perception. It seems plausible that motion and disparity input is processed in parallel and integrated late in the visual processing hierarchy

    Neural Dynamics of Motion Processing and Speed Discrimination

    Full text link
    A neural network model of visual motion perception and speed discrimination is presented. The model shows how a distributed population code of speed tuning, that realizes a size-speed correlation, can be derived from the simplest mechanisms whereby activations of multiple spatially short-range filters of different size are transformed into speed-tuned cell responses. These mechanisms use transient cell responses to moving stimuli, output thresholds that covary with filter size, and competition. These mechanisms are proposed to occur in the Vl→7 MT cortical processing stream. The model reproduces empirically derived speed discrimination curves and simulates data showing how visual speed perception and discrimination can be affected by stimulus contrast, duration, dot density and spatial frequency. Model motion mechanisms are analogous to mechanisms that have been used to model 3-D form and figure-ground perception. The model forms the front end of a larger motion processing system that has been used to simulate how global motion capture occurs, and how spatial attention is drawn to moving forms. It provides a computational foundation for an emerging neural theory of 3-D form and motion perception.Office of Naval Research (N00014-92-J-4015, N00014-91-J-4100, N00014-95-1-0657, N00014-95-1-0409, N00014-94-1-0597, N00014-95-1-0409); Air Force Office of Scientific Research (F49620-92-J-0499); National Science Foundation (IRI-90-00530

    Modelling the dynamics of motion integration with a new luminance-gated diffusion mechanism

    Get PDF
    The dynamics of motion integration show striking similarities when observed at neuronal, psychophysical, and oculomotor levels. Based on the inter-relation and complementary insights given by those dynamics, our goal was to test how basic mechanisms of dynamical cortical processing can be incorporated in a dynamical model to solve several aspects of 2D motion integration and segmentation. Our model is inspired by the hierarchical processing stages of the primate visual cortex: we describe the interactions between several layers processing local motion and form information through feedforward, feedback, and inhibitive lateral connections. Also, following perceptual studies concerning contour integration and physiological studies of receptive fields, we postulate that motion estimation takes advantage of another low level cue, which is luminance smoothness along edges or surfaces, in order to gate recurrent motion diffusion. With such a model, we successfully reproduced the temporal dynamics of motion integration on a wide range of simple motion stimuli: line segments, rotating ellipses, plaids, and barber poles. Furthermore, we showed that the proposed computational rule of luminance-gated diffusion of motion information is sufficient to explain a large set of contextual modulations of motion integration and segmentation in more elaborated stimuli such as chopstick illusions, simulated aperture problems, or rotating diamonds. As a whole, in this paper we proposed a new basal luminance-driven motion integration mechanism as an alternative to less parsimonious models, we carefully investigated the dynamics of motion integration, and we established a distinction between simple and complex stimuli according to the kind of information required to solve their ambiguities

    The General Flow-Adaptive Filter : With Applications to Ultrasound Image Sequences

    Get PDF
    While image filtering is limited to two dimensions, the filtering of image sequences can utilize three dimensions; two spatial and one temporal. Unfortunately, simple extensions of common two-dimensional filters into three dimensions yield undesirable motion blurring of the images. This thesis addresses this problem and introduces a novel filtering approach termed the general flow-adaptive filter. Most often a three-dimensional filter can be visualized as a cubic lattice shifted over the data, and at each point the element corresponding to the central coordinate is replaced with a new value based entirely on the values inside the lattice. The general principle of the flow-adaptive approach is to spatially adapt the entire filter lattice to possibly complex spatial movements in the temporal domain by incorporating local flow-field estimates. Results using the flow-adaptive technique on five filters the temporal discontinuity filter, a tensor-based adaptive filter, the average, the median and a Gaussianshaped convolution filter are presented. Both ultrasound image sequences and synthetic data sets were filtered. An edge-adaptive normalized mean-squared error is used as performance metric on the filtered synthetic sets, and the error is shown to be substantially reduced using the flow-adaptive technique, as much as halved in many instances. There are even indications that simple Gaussian-shaped convolution filters can outperform larger and more complex adaptive filters by implementing the flow-adaptive procedure. For the ultrasound image sequences, the filters adopting the flow-adaptive principles had outputs with less motion blur and sharper contrast compared to the outputs of the non-flow-adaptive filters. At the cost of flow estimation, the flow-adaptive approach substantially improves the performance of all the filters included in this study
    • 

    corecore