171 research outputs found

    Computer vision in target pursuit using a UAV

    Get PDF
    Research in target pursuit using Unmanned Aerial Vehicle (UAV) has gained attention in recent years, this is primarily due to decrease in cost and increase in demand of small UAVs in many sectors. In computer vision, target pursuit is a complex problem as it involves the solving of many sub-problems which are typically concerned with the detection, tracking and following of the object of interest. At present, the majority of related existing methods are developed using computer simulation with the assumption of ideal environmental factors, while the remaining few practical methods are mainly developed to track and follow simple objects that contain monochromatic colours with very little texture variances. Current research in this topic is lacking of practical vision based approaches. Thus the aim of this research is to fill the gap by developing a real-time algorithm capable of following a person continuously given only a photo input. As this research considers the whole procedure as an autonomous system, therefore the drone is activated automatically upon receiving a photo of a person through Wi-Fi. This means that the whole system can be triggered by simply emailing a single photo from any device anywhere. This is done by first implementing image fetching to automatically connect to WIFI, download the image and decode it. Then, human detection is performed to extract the template from the upper body of the person, the intended target is acquired using both human detection and template matching. Finally, target pursuit is achieved by tracking the template continuously while sending the motion commands to the drone. In the target pursuit system, the detection is mainly accomplished using a proposed human detection method that is capable of detecting, extracting and segmenting the human body figure robustly from the background without prior training. This involves detecting face, head and shoulder separately, mainly using gradient maps. While the tracking is mainly accomplished using a proposed generic and non-learning template matching method, this involves combining intensity template matching with colour histogram model and employing a three-tier system for template management. A flight controller is also developed, it supports three types of controls: keyboard, mouse and text messages. Furthermore, the drone is programmed with three different modes: standby, sentry and search. To improve the detection and tracking of colour objects, this research has also proposed several colour related methods. One of them is a colour model for colour detection which consists of three colour components: hue, purity and brightness. Hue represents the colour angle, purity represents the colourfulness and brightness represents intensity. It can be represented in three different geometric shapes: sphere, hemisphere and cylinder, each of these shapes also contains two variations. Experimental results have shown that the target pursuit algorithm is capable of identifying and following the target person robustly given only a photo input. This can be evidenced by the live tracking and mapping of the intended targets with different clothing in both indoor and outdoor environments. Additionally, the various methods developed in this research could enhance the performance of practical vision based applications especially in detecting and tracking of objects

    Improved Multi-resolution Analysis of the Motion Patterns in Video for Human Action Classification

    Get PDF
    The automatic recognition of human actions in video is of great interest in many applications such as automated surveillance, content-based video summarization, video search, and indexing. The problem is challenging due to a wide range of variations among the motion pattern of a given action such as walking across different subjects and the low variations among similar motions such as running and jogging. This thesis has three contributions in a discriminative bottom-up framework to improve the multi-resolution analysis of the motion patterns in video for better recognition of human actions. The first contribution of this thesis is the introduction of a novel approach for a robust local motion feature detection in video. To this end, four different multi-resolution temporally causal and asymmetric filters of log Gaussian, scale-derivative Gaussian, Poisson, and asymmetric sinc are introduced. The performance of these filters is compared with the widely used multi-resolution Gabor filter in a common framework for detection of local salient motions. The features obtained from the asymmetric filtering are more precise and more robust under geometric deformations such as view change or affine transformations. Moreover, they provide higher classification accuracy when they are used with a standard bag-of-words representation of actions and a single discriminative classifier. The experimental results show that the asymmetric sinc performs the best. The Poisson and the scale-derivative Gaussian perform better than log Gaussian and that better than the symmetric temporal Gabor filter. The second contribution of this thesis is the introduction of an efficient action representation. The observation is that the salient features at different spatial and temporal scales characterize different motion information. A multi-resolution analysis of the motion characteristic should be representative of different actions. A multi-resolution action signature provides a more discriminative video representation. The third contribution of this thesis is on the classification of different human actions. To this end, an ensemble of classifiers in a multiple classifier systems (MCS) framework with a parallel topology is utilized. This framework can fully benefit from the multi-resolution characteristics of the motion patterns in the human actions. The classification combination concept of the MCS has been then extended to address two problems in the configuration setting of a recognition framework, namely the choice of distance metric for comparing the action representations and the size of the codebook by which an action is represented. This implication of MCS at multiple stages of the recognition pipeline provides a multi-stage MCS framework which outperforms the existing methods which use a single classifier. Based on the experimental results of the local feature detection and the action classification, the multi-stage MCS framework, which uses the multi-scale features obtained from the temporal asymmetric sinc filtering, is recommended for the task of human action recognition in video.1 yea
    • …
    corecore