3,588 research outputs found

    Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background

    Full text link
    In this paper we propose a novel approach for detecting and tracking objects in videos with variable background i.e. videos captured by moving cameras without any additional sensor. In a video captured by a moving camera, both the background and foreground are changing in each frame of the image sequence. So for these videos, modeling a single background with traditional background modeling methods is infeasible and thus the detection of actual moving object in a variable background is a challenging task. To detect actual moving object in this work, spatio-temporal blobs have been generated in each frame by spatio-temporal analysis of the image sequence using a three-dimensional Gabor filter. Then individual blobs, which are parts of one object are merged using Minimum Spanning Tree to form the moving object in the variable background. The height, width and four-bin gray-value histogram of the object are calculated as its features and an object is tracked in each frame using these features to generate the trajectories of the object through the video sequence. In this work, problem of data association during tracking is solved by Linear Assignment Problem and occlusion is handled by the application of kalman filter. The major advantage of our method over most of the existing tracking algorithms is that, the proposed method does not require initialization in the first frame or training on sample data to perform. Performance of the algorithm has been tested on benchmark videos and very satisfactory result has been achieved. The performance of the algorithm is also comparable and superior with respect to some benchmark algorithms

    Unsupervised learning of depth and motion

    Full text link
    We present a model for the joint estimation of disparity and motion. The model is based on learning about the interrelations between images from multiple cameras, multiple frames in a video, or the combination of both. We show that learning depth and motion cues, as well as their combinations, from data is possible within a single type of architecture and a single type of learning algorithm, by using biologically inspired "complex cell" like units, which encode correlations between the pixels across image pairs. Our experimental results show that the learning of depth and motion makes it possible to achieve state-of-the-art performance in 3-D activity analysis, and to outperform existing hand-engineered 3-D motion features by a very large margin

    BIT: Biologically Inspired Tracker

    Full text link
    Visual tracking is challenging due to image variations caused by various factors, such as object deformation, scale change, illumination change and occlusion. Given the superior tracking performance of human visual system (HVS), an ideal design of biologically inspired model is expected to improve computer visual tracking. This is however a difficult task due to the incomplete understanding of neurons' working mechanism in HVS. This paper aims to address this challenge based on the analysis of visual cognitive mechanism of the ventral stream in the visual cortex, which simulates shallow neurons (S1 units and C1 units) to extract low-level biologically inspired features for the target appearance and imitates an advanced learning mechanism (S2 units and C2 units) to combine generative and discriminative models for target location. In addition, fast Gabor approximation (FGA) and fast Fourier transform (FFT) are adopted for real-time learning and detection in this framework. Extensive experiments on large-scale benchmark datasets show that the proposed biologically inspired tracker performs favorably against state-of-the-art methods in terms of efficiency, accuracy, and robustness. The acceleration technique in particular ensures that BIT maintains a speed of approximately 45 frames per second

    Learning vector representation of local content and matrix representation of local motion, with implications for V1

    Full text link
    This paper proposes a representational model for image pair such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components. (1) The vector representations of local contents of images. (2) The matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Our experiments show that our model can learn to infer local motions. Moreover, the model can learn Gabor-like filter pairs of quadrature phases

    Bio-Inspired Human Action Recognition using Hybrid Max-Product Neuro-Fuzzy Classifier and Quantum-Behaved PSO

    Full text link
    Studies on computational neuroscience through functional magnetic resonance imaging (fMRI) and following biological inspired system stated that human action recognition in the brain of mammalian leads two distinct pathways in the model, which are specialized for analysis of motion (optic flow) and form information. Principally, we have defined a novel and robust form features applying active basis model as form extractor in form pathway in the biological inspired model. An unbalanced synergetic neural net-work classifies shapes and structures of human objects along with tuning its attention parameter by quantum particle swarm optimization (QPSO) via initiation of Centroidal Voronoi Tessellations. These tools utilized and justified as strong tools for following biological system model in form pathway. But the final decision has done by combination of ultimate outcomes of both pathways via fuzzy inference which increases novality of proposed model. Combination of these two brain pathways is done by considering each feature sets in Gaussian membership functions with fuzzy product inference method. Two configurations have been proposed for form pathway: applying multi-prototype human action templates using two time synergetic neural network for obtaining uniform template regarding each actions, and second scenario that it uses abstracting human action in four key-frames. Experimental results showed promising accuracy performance on different datasets (KTH and Weizmann).Comment: author's version, SWJ 201

    Video Primal Sketch: A Unified Middle-Level Representation for Video

    Full text link
    This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME /MRF model reproducing feature statistics extracted from input video to implicitly represent textured motion, such as water and fire. The feature statistics include histograms of spatio-temporal filters and velocity distributions. This paper makes three contributions to the literature: i) Learning a dictionary of video primitives using parametric generative models; ii) Proposing the Spatio-Temporal FRAME (ST-FRAME) and Motion-Appearance FRAME (MA-FRAME) models for modeling and synthesizing textured motion; and iii) Developing a parsimonious hybrid model for generic video representation. Given an input video, VPS selects the proper models automatically for different motion patterns and is compatible with high-level action representations. In the experiments, we synthesize a number of textured motion; reconstruct real videos using the VPS; report a series of human perception experiments to verify the quality of reconstructed videos; demonstrate how the VPS changes over the scale transition in videos; and present the close connection between VPS and high-level action models

    Vision-based Human Gender Recognition: A Survey

    Full text link
    Gender is an important demographic attribute of people. This paper provides a survey of human gender recognition in computer vision. A review of approaches exploiting information from face and whole body (either from a still image or gait sequence) is presented. We highlight the challenges faced and survey the representative methods of these approaches. Based on the results, good performance have been achieved for datasets captured under controlled environments, but there is still much work that can be done to improve the robustness of gender recognition under real-life environments.Comment: 30 page

    Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications

    Full text link
    Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research

    Density Weighted Connectivity of Grass Pixels in Image Frames for Biomass Estimation

    Full text link
    Accurate estimation of the biomass of roadside grasses plays a significant role in applications such as fire-prone region identification. Current solutions heavily depend on field surveys, remote sensing measurements and image processing using reference markers, which often demand big investments of time, effort and cost. This paper proposes Density Weighted Connectivity of Grass Pixels (DWCGP) to automatically estimate grass biomass from roadside image data. The DWCGP calculates the length of continuously connected grass pixels along a vertical orientation in each image column, and then weights the length by the grass density in a surrounding region of the column. Grass pixels are classified using feedforward artificial neural networks and the dominant texture orientation at every pixel is computed using multi-orientation Gabor wavelet filter vote. Evaluations on a field survey dataset show that the DWCGP reduces Root-Mean-Square Error from 5.84 to 5.52 by additionally considering grass density on top of grass height. The DWCGP shows robustness to non-vertical grass stems and to changes of both Gabor filter parameters and surrounding region widths. It also has performance close to human observation and higher than eight baseline approaches, as well as promising results for classifying low vs. high fire risk and identifying fire-prone road regions.Comment: 28 pages, accepted manuscript, Expert Systems with Application

    Introduction To The Monogenic Signal

    Full text link
    The monogenic signal is an image analysis methodology that was introduced by Felsberg and Sommer in 2001 and has been employed for a variety of purposes in image processing and computer vision research. In particular, it has been found to be useful in the analysis of ultrasound imagery in several research scenarios mostly in work done within the BioMedIA lab at Oxford. However, the literature on the monogenic signal can be difficult to penetrate due to the lack of a single resource to explain the various principles from basics. The purpose of this document is therefore to introduce the principles, purpose, applications, and limitations of the methodology. It assumes some background knowledge from the fields of image and signal processing, in particular a good knowledge of Fourier transforms as applied to signals and images. We will not attempt to provide a thorough math- ematical description or derivation of the monogenic signal, but rather focus on developing an intuition for understanding and using the methodology and refer the reader elsewhere for a more mathematical treatment
    • …
    corecore