110 research outputs found

    Object detection and tracking in video image

    Get PDF
    In recent days, capturing images with high quality and good size is so easy because of rapid improvement in quality of capturing device with less costly but superior technology. Videos are a collection of sequential images with a constant time interval. So video can provide more information about our object when scenarios are changing with respect to time. Therefore, manually handling videos are quite impossible. So we need an automated devise to process these videos. In this thesis one such attempt has been made to track objects in videos. Many algorithms and technology have been developed to automate monitoring the object in a video file. Object detection and tracking is a one of the challenging task in computer vision. Mainly there are three basic steps in video analysis: Detection of objects of interest from moving objects, Tracking of that interested objects in consecutive frames, and Analysis of object tracks to understand their behavior. Simple object detection compares a static background frame at the pixel level with the current frame of video. The existing method in this domain first tries to detect the interest object in video frames. One of the main difficulties in object tracking among many others is to choose suitable features and models for recognizing and tracking the interested object from a video. Some common choice to choose suitable feature to categories, visual objects are intensity, shape, color and feature points. In this thesis, we studied about mean shift tracking based on the color pdf, optical flow tracking based on the intensity and motion; SIFT tracking based on scale invariant local feature points. Preliminary results from experiments have shown that the adopted method is able to track targets with translation, rotation, partial occlusion and deformation

    Hand tracking and bimanual movement understanding

    Get PDF
    Bimanual movements are a subset ot human movements in which the two hands move together in order to do a task or imply a meaning A bimanual movement appearing in a sequence of images must be understood in order to enable computers to interact with humans in a natural way This problem includes two main phases, hand tracking and movement recognition. We approach the problem of hand tracking from a neuroscience point ot view First the hands are extracted and labelled by colour detection and blob analysis algorithms In the presence of the two hands one hand may occlude the other occasionally Therefore, hand occlusions must be detected in an image sequence A dynamic model is proposed to model the movement of each hand separately Using this model in a Kalman filtering proccss the exact starting and end points of hand occlusions are detected We exploit neuroscience phenomena to understand the beha\ tour of the hands during occlusion periods Based on this, we propose a general hand tracking algorithm to track and reacquire the hands over a movement including hand occlusion The advantages of the algorithm and its generality are demonstrated in the experiments. In order to recognise the movements first we recognise the movement of a hand Using statistical pattern recognition methods (such as Principal Component Analysis and Nearest Neighbour) the static shape of each hand appearing in an image is recognised A Graph- Matching algorithm and Discrete Midden Markov Models (DHMM) as two spatio-temporal pattern recognition techniques are imestigated tor recognising a dynamic hand gesture For recognising bimanual movements we consider two general forms ot these movements, single and concatenated periodic We introduce three Bayesian networks for recognising die movements The networks are designed to recognise and combinc the gestures of the hands in order to understand the whole movement Experiments on different types ot movement demonstrate the advantages and disadvantages of each network

    Modelling, tracking and generating human interaction behaviours in video

    Get PDF
    Intelligent virtual characters are becoming increasingly popular in en­ tertainment, educational and simulation software. A virtual charac­ ter is the creation or re-creation of a human being in an image, using computer-generated imagery. It must act and react in the environment, drawing on the disciplines of automated reasoning and planning. Creating characters with human-like behaviours that respond interactively to a real person in a video, is still a serious challenge. There are several major reasons for this. First, human motion is very complex, which makes it particularly difficult to simulate. Second, the human form is also not straightforward to design due to the large number of degrees of freedom of the motion. Third, creating novel contextual movements for virtual characters in real time is a new research area.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    State Estimation and Smoothing for the Probability Hypothesis Density Filter

    No full text
    Tracking multiple objects is a challenging problem for an automated system, with applications in many domains. Typically the system must be able to represent the posterior distribution of the state of the targets, using a recursive algorithm that takes information from noisy measurements. However, in many important cases the number of targets is also unknown, and has also to be estimated from data. The Probability Hypothesis Density (PHD) filter is an effective approach for this problem. The method uses a first-order moment approximation to develop a recursive algorithm for the optimal Bayesian filter. The PHD recursion can implemented in closed form in some restricted cases, and more generally using Sequential Monte Carlo (SMC) methods. The assumptions made in the PHD filter are appealing for computational reasons in real-time tracking implementations. These are only justifiable when the signal to noise ratio (SNR) of a single target is high enough that remediates the loss of information from the approximation. Although the original derivation of the PHD filter is based on functional expansions of belief-mass functions, it can also be developed by exploiting elementary constructions of Poisson processes. This thesis presents novel strategies for improving the Sequential Monte Carlo implementation of PHD filter using the point process approach. Firstly, we propose a post-processing state estimation step for the PHD filter, using Markov Chain Monte Carlo methods for mixture models. Secondly, we develop recursive Bayesian smoothing algorithms using the approximations of the filter backwards in time. The purpose of both strategies is to overcome the problems arising from the PHD filter assumptions. As a motivating example, we analyze the performance of the methods for the difficult problem of person tracking in crowded environment

    Modelling, tracking and generating human interaction behaviours in video.

    Get PDF
    Intelligent virtual characters are becoming increasingly popular in en­ tertainment, educational and simulation software. A virtual charac­ ter is the creation or re-creation of a human being in an image, using computer-generated imagery. It must act and react in the environment, drawing on the disciplines of automated reasoning and planning. Cre­ ating characters with human-like behaviours that respond interactively to a real person in a video, is still a serious challenge. There are several major reasons for this. First, human motion is very complex, which makes it particularly difficult to simulate. Second, the human form is also not straightforward to design due to the large number of degrees of freedom of the motion. Third, creating novel contextual movements for virtual characters in real time is a new research area

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Novel data association methods for online multiple human tracking

    Get PDF
    PhD ThesisVideo-based multiple human tracking has played a crucial role in many applications such as intelligent video surveillance, human behavior analysis, and health-care systems. The detection based tracking framework has become the dominant paradigm in this research eld, and the major task is to accurately perform the data association between detections across the frames. However, online multiple human tracking, which merely relies on the detections given up to the present time for the data association, becomes more challenging with noisy detections, missed detections, and occlusions. To address these challenging problems, there are three novel data association methods for online multiple human tracking are presented in this thesis, which are online group-structured dictionary learning, enhanced detection reliability and multi-level cooperative fusion. The rst proposed method aims to address the noisy detections and occlusions. In this method, sequential Monte Carlo probability hypothesis density (SMC-PHD) ltering is the core element for accomplishing the tracking task, where the measurements are produced by the detection based tracking framework. To enhance the measurement model, a novel adaptive gating strategy is developed to aid the classi cation of measurements. In addition, online group-structured dictionary learning with a maximum voting method is proposed to estimate robustly the target birth intensity. It enables the new-born targets in the tracking process to be accurately initialized from noisy sensor measurements. To improve the adaptability of the group-structured dictionary to target appearance changes, the simultaneous codeword optimization (SimCO) algorithm is employed for the dictionary update. The second proposed method relates to accurate measurement selection of detections, which is further to re ne the noisy detections prior to the tracking pipeline. In order to achieve more reliable measurements in the Gaussian mixture (GM)-PHD ltering process, a global-to-local enhanced con dence rescoring strategy is proposed by exploiting the classi cation power of a mask region-convolutional neural network (R-CNN). Then, an improved pruning algorithm namely soft-aggregated non-maximal suppression (Soft-ANMS) is devised to further enhance the selection step. In addition, to avoid the misuse of ambiguous measurements in the tracking process, person re-identi cation (ReID) features driven by convolutional neural networks (CNNs) are integrated to model the target appearances. The third proposed method focuses on addressing the issues of missed detections and occlusions. This method integrates two human detectors with di erent characteristics (full-body and body-parts) in the GM-PHD lter, and investigates their complementary bene ts for tracking multiple targets. For each detector domain, a novel discriminative correlation matching (DCM) model for integration in the feature-level fusion is proposed, and together with spatio-temporal information is used to reduce the ambiguous identity associations in the GM-PHD lter. Moreover, a robust fusion center is proposed within the decision-level fusion to mitigate the sensitivity of missed detections in the fusion process, thereby improving the fusion performance and tracking consistency. The e ectiveness of these proposed methods are investigated using the MOTChallenge benchmark, which is a framework for the standardized evaluation of multiple object tracking methods. Detailed evaluations on challenging video datasets, as well as comparisons with recent state-of-the-art techniques, con rm the improved multiple human tracking performance
    corecore