110 research outputs found
Object detection and tracking in video image
In recent days, capturing images with high quality and good size is so easy because of rapid improvement in quality of capturing device with less costly but superior technology. Videos are a collection of sequential images with a constant time interval. So video can provide more information about our object when scenarios are changing with respect to time. Therefore, manually handling videos are quite impossible. So we need an automated devise to process these videos. In this thesis one such attempt has been made to track objects in videos. Many algorithms and technology have been developed to automate monitoring the object in a video file. Object detection and tracking is a one of the challenging task in computer vision. Mainly there are three basic steps in video analysis: Detection of objects of interest from moving objects, Tracking of that interested objects in consecutive frames, and Analysis of object tracks to understand their behavior. Simple object detection compares a static background frame at the pixel level with the current frame of video. The existing method in this domain first tries to detect the interest object in video frames. One of the main difficulties in object tracking among many others is to choose suitable features and models for recognizing and tracking the interested object from a video. Some common choice to choose suitable feature to categories, visual objects are intensity, shape, color and feature points. In this thesis, we studied about mean shift tracking based on the color pdf, optical flow tracking based on the intensity and motion; SIFT tracking based on scale invariant local feature points. Preliminary results from experiments have shown that the adopted method is able to track targets with translation, rotation, partial occlusion and deformation
Hand tracking and bimanual movement understanding
Bimanual movements are a subset ot human movements in which the two hands move together in order to do a task or imply a meaning A bimanual movement appearing in a sequence of images must be understood in order to enable computers to interact with humans in a natural way This problem includes two main phases, hand tracking and movement recognition.
We approach the problem of hand tracking from a neuroscience point ot view First the hands are extracted and labelled by colour detection and blob analysis algorithms In the presence of the two hands one hand may occlude the other occasionally Therefore, hand occlusions must be detected in an image sequence A dynamic model is proposed to model the movement of each hand separately Using this model in a Kalman filtering proccss the exact starting and end points of hand occlusions are detected We exploit neuroscience phenomena to understand the beha\ tour of the hands during occlusion periods Based on this, we propose a general hand tracking algorithm to track and reacquire the hands over a movement including hand occlusion The advantages of the algorithm and its generality are demonstrated in the experiments.
In order to recognise the movements first we recognise the movement of a hand Using statistical pattern recognition methods (such as Principal Component Analysis and Nearest Neighbour) the static shape of each hand appearing in an image is recognised A Graph- Matching algorithm and Discrete Midden Markov Models (DHMM) as two spatio-temporal pattern recognition techniques are imestigated tor recognising a dynamic hand gesture
For recognising bimanual movements we consider two general forms ot these movements, single and concatenated periodic We introduce three Bayesian networks for recognising die movements The networks are designed to recognise and combinc the gestures of the hands in order to understand the whole movement Experiments on different types ot movement demonstrate the advantages and disadvantages of each network
Modelling, tracking and generating human interaction behaviours in video
Intelligent virtual characters are becoming increasingly popular in en tertainment, educational and simulation software. A virtual charac ter is the creation or re-creation of a human being in an image, using computer-generated imagery. It must act and react in the environment, drawing on the disciplines of automated reasoning and planning. Creating characters with human-like behaviours that respond interactively to a real person in a video, is still a serious challenge. There are several major reasons for this. First, human motion is very complex, which makes it particularly difficult to simulate. Second, the human form is also not straightforward to design due to the large number of degrees of freedom of the motion. Third, creating novel contextual movements for virtual characters in real time is a new research area.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
State Estimation and Smoothing for the Probability Hypothesis Density Filter
Tracking multiple objects is a challenging problem for an automated system,
with applications in many domains. Typically the system must be able to
represent the posterior distribution of the state of the targets, using a recursive
algorithm that takes information from noisy measurements. However, in
many important cases the number of targets is also unknown, and has also
to be estimated from data.
The Probability Hypothesis Density (PHD) filter is an effective approach
for this problem. The method uses a first-order moment approximation to
develop a recursive algorithm for the optimal Bayesian filter. The PHD
recursion can implemented in closed form in some restricted cases, and more
generally using Sequential Monte Carlo (SMC) methods. The assumptions
made in the PHD filter are appealing for computational reasons in real-time
tracking implementations. These are only justifiable when the signal to noise
ratio (SNR) of a single target is high enough that remediates the loss of
information from the approximation.
Although the original derivation of the PHD filter is based on functional
expansions of belief-mass functions, it can also be developed by exploiting elementary
constructions of Poisson processes. This thesis presents novel strategies
for improving the Sequential Monte Carlo implementation of PHD filter
using the point process approach. Firstly, we propose a post-processing state
estimation step for the PHD filter, using Markov Chain Monte Carlo methods
for mixture models. Secondly, we develop recursive Bayesian smoothing
algorithms using the approximations of the filter backwards in time. The
purpose of both strategies is to overcome the problems arising from the PHD
filter assumptions. As a motivating example, we analyze the performance of
the methods for the difficult problem of person tracking in crowded environment
Modelling, tracking and generating human interaction behaviours in video.
Intelligent virtual characters are becoming increasingly popular in en
tertainment, educational and simulation software. A virtual charac
ter is the creation or re-creation of a human being in an image, using
computer-generated imagery. It must act and react in the environment,
drawing on the disciplines of automated reasoning and planning. Cre
ating characters with human-like behaviours that respond interactively
to a real person in a video, is still a serious challenge. There are several
major reasons for this. First, human motion is very complex, which
makes it particularly difficult to simulate. Second, the human form is
also not straightforward to design due to the large number of degrees
of freedom of the motion. Third, creating novel contextual movements
for virtual characters in real time is a new research area
Text Extraction From Natural Scene: Methodology And Application
With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction.
Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods.
Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications
Novel data association methods for online multiple human tracking
PhD ThesisVideo-based multiple human tracking has played a crucial role in many applications
such as intelligent video surveillance, human behavior analysis, and
health-care systems. The detection based tracking framework has become
the dominant paradigm in this research eld, and the major task is to accurately
perform the data association between detections across the frames.
However, online multiple human tracking, which merely relies on the detections
given up to the present time for the data association, becomes more
challenging with noisy detections, missed detections, and occlusions. To
address these challenging problems, there are three novel data association
methods for online multiple human tracking are presented in this thesis,
which are online group-structured dictionary learning, enhanced detection
reliability and multi-level cooperative fusion.
The rst proposed method aims to address the noisy detections and
occlusions. In this method, sequential Monte Carlo probability hypothesis
density (SMC-PHD) ltering is the core element for accomplishing the
tracking task, where the measurements are produced by the detection based
tracking framework. To enhance the measurement model, a novel adaptive
gating strategy is developed to aid the classi cation of measurements. In
addition, online group-structured dictionary learning with a maximum voting
method is proposed to estimate robustly the target birth intensity. It
enables the new-born targets in the tracking process to be accurately initialized
from noisy sensor measurements. To improve the adaptability of the
group-structured dictionary to target appearance changes, the simultaneous
codeword optimization (SimCO) algorithm is employed for the dictionary
update.
The second proposed method relates to accurate measurement selection
of detections, which is further to re ne the noisy detections prior to the tracking
pipeline. In order to achieve more reliable measurements in the Gaussian
mixture (GM)-PHD ltering process, a global-to-local enhanced con dence
rescoring strategy is proposed by exploiting the classi cation power of a mask
region-convolutional neural network (R-CNN). Then, an improved pruning
algorithm namely soft-aggregated non-maximal suppression (Soft-ANMS) is
devised to further enhance the selection step. In addition, to avoid the misuse
of ambiguous measurements in the tracking process, person re-identi cation
(ReID) features driven by convolutional neural networks (CNNs) are integrated
to model the target appearances.
The third proposed method focuses on addressing the issues of missed
detections and occlusions. This method integrates two human detectors
with di erent characteristics (full-body and body-parts) in the GM-PHD
lter, and investigates their complementary bene ts for tracking multiple
targets. For each detector domain, a novel discriminative correlation matching
(DCM) model for integration in the feature-level fusion is proposed, and
together with spatio-temporal information is used to reduce the ambiguous
identity associations in the GM-PHD lter. Moreover, a robust fusion
center is proposed within the decision-level fusion to mitigate the sensitivity
of missed detections in the fusion process, thereby improving the fusion
performance and tracking consistency.
The e ectiveness of these proposed methods are investigated using the
MOTChallenge benchmark, which is a framework for the standardized evaluation
of multiple object tracking methods. Detailed evaluations on challenging
video datasets, as well as comparisons with recent state-of-the-art
techniques, con rm the improved multiple human tracking performance
- …