3,588 research outputs found
Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background
In this paper we propose a novel approach for detecting and tracking objects
in videos with variable background i.e. videos captured by moving cameras
without any additional sensor. In a video captured by a moving camera, both the
background and foreground are changing in each frame of the image sequence. So
for these videos, modeling a single background with traditional background
modeling methods is infeasible and thus the detection of actual moving object
in a variable background is a challenging task. To detect actual moving object
in this work, spatio-temporal blobs have been generated in each frame by
spatio-temporal analysis of the image sequence using a three-dimensional Gabor
filter. Then individual blobs, which are parts of one object are merged using
Minimum Spanning Tree to form the moving object in the variable background. The
height, width and four-bin gray-value histogram of the object are calculated as
its features and an object is tracked in each frame using these features to
generate the trajectories of the object through the video sequence. In this
work, problem of data association during tracking is solved by Linear
Assignment Problem and occlusion is handled by the application of kalman
filter. The major advantage of our method over most of the existing tracking
algorithms is that, the proposed method does not require initialization in the
first frame or training on sample data to perform. Performance of the algorithm
has been tested on benchmark videos and very satisfactory result has been
achieved. The performance of the algorithm is also comparable and superior with
respect to some benchmark algorithms
Unsupervised learning of depth and motion
We present a model for the joint estimation of disparity and motion. The
model is based on learning about the interrelations between images from
multiple cameras, multiple frames in a video, or the combination of both. We
show that learning depth and motion cues, as well as their combinations, from
data is possible within a single type of architecture and a single type of
learning algorithm, by using biologically inspired "complex cell" like units,
which encode correlations between the pixels across image pairs. Our
experimental results show that the learning of depth and motion makes it
possible to achieve state-of-the-art performance in 3-D activity analysis, and
to outperform existing hand-engineered 3-D motion features by a very large
margin
BIT: Biologically Inspired Tracker
Visual tracking is challenging due to image variations caused by various
factors, such as object deformation, scale change, illumination change and
occlusion. Given the superior tracking performance of human visual system
(HVS), an ideal design of biologically inspired model is expected to improve
computer visual tracking. This is however a difficult task due to the
incomplete understanding of neurons' working mechanism in HVS. This paper aims
to address this challenge based on the analysis of visual cognitive mechanism
of the ventral stream in the visual cortex, which simulates shallow neurons (S1
units and C1 units) to extract low-level biologically inspired features for the
target appearance and imitates an advanced learning mechanism (S2 units and C2
units) to combine generative and discriminative models for target location. In
addition, fast Gabor approximation (FGA) and fast Fourier transform (FFT) are
adopted for real-time learning and detection in this framework. Extensive
experiments on large-scale benchmark datasets show that the proposed
biologically inspired tracker performs favorably against state-of-the-art
methods in terms of efficiency, accuracy, and robustness. The acceleration
technique in particular ensures that BIT maintains a speed of approximately 45
frames per second
Learning vector representation of local content and matrix representation of local motion, with implications for V1
This paper proposes a representational model for image pair such as
consecutive video frames that are related by local pixel displacements, in the
hope that the model may shed light on motion perception in primary visual
cortex (V1). The model couples the following two components. (1) The vector
representations of local contents of images. (2) The matrix representations of
local pixel displacements caused by the relative motions between the agent and
the objects in the 3D scene. When the image frame undergoes changes due to
local pixel displacements, the vectors are multiplied by the matrices that
represent the local displacements. Our experiments show that our model can
learn to infer local motions. Moreover, the model can learn Gabor-like filter
pairs of quadrature phases
Bio-Inspired Human Action Recognition using Hybrid Max-Product Neuro-Fuzzy Classifier and Quantum-Behaved PSO
Studies on computational neuroscience through functional magnetic resonance
imaging (fMRI) and following biological inspired system stated that human
action recognition in the brain of mammalian leads two distinct pathways in the
model, which are specialized for analysis of motion (optic flow) and form
information. Principally, we have defined a novel and robust form features
applying active basis model as form extractor in form pathway in the biological
inspired model. An unbalanced synergetic neural net-work classifies shapes and
structures of human objects along with tuning its attention parameter by
quantum particle swarm optimization (QPSO) via initiation of Centroidal Voronoi
Tessellations. These tools utilized and justified as strong tools for following
biological system model in form pathway. But the final decision has done by
combination of ultimate outcomes of both pathways via fuzzy inference which
increases novality of proposed model. Combination of these two brain pathways
is done by considering each feature sets in Gaussian membership functions with
fuzzy product inference method. Two configurations have been proposed for form
pathway: applying multi-prototype human action templates using two time
synergetic neural network for obtaining uniform template regarding each
actions, and second scenario that it uses abstracting human action in four
key-frames. Experimental results showed promising accuracy performance on
different datasets (KTH and Weizmann).Comment: author's version, SWJ 201
Video Primal Sketch: A Unified Middle-Level Representation for Video
This paper presents a middle-level video representation named Video Primal
Sketch (VPS), which integrates two regimes of models: i) sparse coding model
using static or moving primitives to explicitly represent moving corners,
lines, feature points, etc., ii) FRAME /MRF model reproducing feature
statistics extracted from input video to implicitly represent textured motion,
such as water and fire. The feature statistics include histograms of
spatio-temporal filters and velocity distributions. This paper makes three
contributions to the literature: i) Learning a dictionary of video primitives
using parametric generative models; ii) Proposing the Spatio-Temporal FRAME
(ST-FRAME) and Motion-Appearance FRAME (MA-FRAME) models for modeling and
synthesizing textured motion; and iii) Developing a parsimonious hybrid model
for generic video representation. Given an input video, VPS selects the proper
models automatically for different motion patterns and is compatible with
high-level action representations. In the experiments, we synthesize a number
of textured motion; reconstruct real videos using the VPS; report a series of
human perception experiments to verify the quality of reconstructed videos;
demonstrate how the VPS changes over the scale transition in videos; and
present the close connection between VPS and high-level action models
Vision-based Human Gender Recognition: A Survey
Gender is an important demographic attribute of people. This paper provides a
survey of human gender recognition in computer vision. A review of approaches
exploiting information from face and whole body (either from a still image or
gait sequence) is presented. We highlight the challenges faced and survey the
representative methods of these approaches. Based on the results, good
performance have been achieved for datasets captured under controlled
environments, but there is still much work that can be done to improve the
robustness of gender recognition under real-life environments.Comment: 30 page
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Density Weighted Connectivity of Grass Pixels in Image Frames for Biomass Estimation
Accurate estimation of the biomass of roadside grasses plays a significant
role in applications such as fire-prone region identification. Current
solutions heavily depend on field surveys, remote sensing measurements and
image processing using reference markers, which often demand big investments of
time, effort and cost. This paper proposes Density Weighted Connectivity of
Grass Pixels (DWCGP) to automatically estimate grass biomass from roadside
image data. The DWCGP calculates the length of continuously connected grass
pixels along a vertical orientation in each image column, and then weights the
length by the grass density in a surrounding region of the column. Grass pixels
are classified using feedforward artificial neural networks and the dominant
texture orientation at every pixel is computed using multi-orientation Gabor
wavelet filter vote. Evaluations on a field survey dataset show that the DWCGP
reduces Root-Mean-Square Error from 5.84 to 5.52 by additionally considering
grass density on top of grass height. The DWCGP shows robustness to
non-vertical grass stems and to changes of both Gabor filter parameters and
surrounding region widths. It also has performance close to human observation
and higher than eight baseline approaches, as well as promising results for
classifying low vs. high fire risk and identifying fire-prone road regions.Comment: 28 pages, accepted manuscript, Expert Systems with Application
Introduction To The Monogenic Signal
The monogenic signal is an image analysis methodology that was introduced by
Felsberg and Sommer in 2001 and has been employed for a variety of purposes in
image processing and computer vision research. In particular, it has been found
to be useful in the analysis of ultrasound imagery in several research
scenarios mostly in work done within the BioMedIA lab at Oxford. However, the
literature on the monogenic signal can be difficult to penetrate due to the
lack of a single resource to explain the various principles from basics. The
purpose of this document is therefore to introduce the principles, purpose,
applications, and limitations of the methodology. It assumes some background
knowledge from the fields of image and signal processing, in particular a good
knowledge of Fourier transforms as applied to signals and images. We will not
attempt to provide a thorough math- ematical description or derivation of the
monogenic signal, but rather focus on developing an intuition for understanding
and using the methodology and refer the reader elsewhere for a more
mathematical treatment
- …