932 research outputs found

    An Approach for Object Tracking in Video Sequences

    Get PDF
    In recent past there has been a significant increase in number of applications effectively utilizing digital videos because of less costly but superior devices. This upsurge in video acquisition has led to huge augmentation of data, which are quite impossible to handle manually. Therefore, an automated means of processing these videos is indispensable. In this thesis one such attempt has been made to track objects in videos. Object tracking comprises two closely related processes; object detection followed by tracking of the detected objects. Algorithms on these two processes are proposed in this thesis. Simple object detection algorithms compare a static background frame at pixel level with the current frame in a video. Existing methods in this domain first try to detect objects and then remove shadows associated with them, which is a two-stage process. The proposed approach combines both the stages into a single stage. Two different algorithms are proposed on object detection. First one to model the background and the next to extract the objects and remove shadows from them. Initially, from first few frames the nature of each pixel is determined as stationary or non-stationary and considering only the stationary pixels a background model is developed. Subsequently, a local thresholding technique is used to extract objects and discard shadows. After successfully detecting all the foreground objects, two different algorithms are proposed for tracking the objects and updating the background model. The first algorithm suggests a centroid searching technique, where a centroid in current frame is estimated from the previous frame. Its accuracy is verified by comparing the entropy of dual-tree complex wavelet coefficients in the bounding boxes of both the frames. If estimation becomes inaccurate, a dynamic window is utilized to search for accurate centroid. The second algorithm updates the background using a randomized updating scheme. Both stages of the proposed tracking model is simulated with various recorded videos. Simulation results are compared with the recent schemes to show the superiority of the model

    Sonar image interpretation for sub-sea operations

    Get PDF
    Mine Counter-Measure (MCM) missions are conducted to neutralise underwater explosives. Automatic Target Recognition (ATR) assists operators by increasing the speed and accuracy of data review. ATR embedded on vehicles enables adaptive missions which increase the speed of data acquisition. This thesis addresses three challenges; the speed of data processing, robustness of ATR to environmental conditions and the large quantities of data required to train an algorithm. The main contribution of this thesis is a novel ATR algorithm. The algorithm uses features derived from the projection of 3D boxes to produce a set of 2D templates. The template responses are independent of grazing angle, range and target orientation. Integer skewed integral images, are derived to accelerate the calculation of the template responses. The algorithm is compared to the Haar cascade algorithm. For a single model of sonar and cylindrical targets the algorithm reduces the Probability of False Alarm (PFA) by 80% at a Probability of Detection (PD) of 85%. The algorithm is trained on target data from another model of sonar. The PD is only 6% lower even though no representative target data was used for training. The second major contribution is an adaptive ATR algorithm that uses local sea-floor characteristics to address the problem of ATR robustness with respect to the local environment. A dual-tree wavelet decomposition of the sea-floor and an Markov Random Field (MRF) based graph-cut algorithm is used to segment the terrain. A Neural Network (NN) is then trained to filter ATR results based on the local sea-floor context. It is shown, for the Haar Cascade algorithm, that the PFA can be reduced by 70% at a PD of 85%. Speed of data processing is addressed using novel pre-processing techniques. The standard three class MRF, for sonar image segmentation, is formulated using graph-cuts. Consequently, a 1.2 million pixel image is segmented in 1.2 seconds. Additionally, local estimation of class models is introduced to remove range dependent segmentation quality. Finally, an A* graph search is developed to remove the surface return, a line of saturated pixels often detected as false alarms by ATR. The A* search identifies the surface return in 199 of 220 images tested with a runtime of 2.1 seconds. The algorithm is robust to the presence of ripples and rocks

    IMPROVED LICENSE PLATE LOCALIZATION ALGORITHM BASED ON MORPHOLOGICAL OPERATIONS

    Get PDF
    Automatic License Plate Recognition (ALPR) systems have become an important tool to track stolen cars, access control, and monitor traffic. ALPR system consists of locating the license plate in an image, followed by character detection and recognition. Since the license plate can exist anywhere within an image, localization is the most important part of ALPR and requires greater processing time. Most ALPR systems are computationally intensive and require a high-performance computer. The proposed algorithm differs significantly from those utilized in previous ALPR technologies by offering a fast algorithm, composed of structural elements which more precisely conducts morphological operations within an image, and can be implemented in portable devices with low computation capabilities. The proposed algorithm is able to accurately detect and differentiate license plates in complex images. This method was first tested through MATLAB with an on-line public database of Greek license plates which is a popular benchmark used in previous works. The proposed algorithm was 100% accurate in all clear images, and achieved 98.45% accuracy when using the entire database which included complex backgrounds and license plates obscured by shadow and dirt. Second, the efficiency of the algorithm was tested in devices with low computational processing power, by translating the code to Python, and was 300% faster than previous work

    Detecting, Tracking, And Recognizing Activities In Aerial Video

    Get PDF
    In this dissertation, we address the problem of detecting humans and vehicles, tracking them in crowded scenes, and finally determining their activities in aerial video. Even though this is a well explored problem in the field of computer vision, many challenges still remain when one is presented with realistic data. These challenges include large camera motion, strong scene parallax, fast object motion, large object density, strong shadows, and insufficiently large action datasets. Therefore, we propose a number of novel methods based on exploiting scene constraints from the imagery itself to aid in the detection and tracking of objects. We show, via experiments on several datasets, that superior performance is achieved with the use of proposed constraints. First, we tackle the problem of detecting moving, as well as stationary, objects in scenes that contain parallax and shadows. We do this on both regular aerial video, as well as the new and challenging domain of wide area surveillance. This problem poses several challenges: large camera motion, strong parallax, large number of moving objects, small number of pixels on target, single channel data, and low frame-rate of video. We propose a method for detecting moving and stationary objects that overcomes these challenges, and evaluate it on CLIF and VIVID datasets. In order to find moving objects, we use median background modelling which requires few frames to obtain a workable model, and is very robust when there is a large number of moving objects in the scene while the model is being constructed. We then iii remove false detections from parallax and registration errors using gradient information from the background image. Relying merely on motion to detect objects in aerial video may not be sufficient to provide complete information about the observed scene. First of all, objects that are permanently stationary may be of interest as well, for example to determine how long a particular vehicle has been parked at a certain location. Secondly, moving vehicles that are being tracked through the scene may sometimes stop and remain stationary at traffic lights and railroad crossings. These prolonged periods of non-motion make it very difficult for the tracker to maintain the identities of the vehicles. Therefore, there is a clear need for a method that can detect stationary pedestrians and vehicles in UAV imagery. This is a challenging problem due to small number of pixels on the target, which makes it difficult to distinguish objects from background clutter, and results in a much larger search space. We propose a method for constraining the search based on a number of geometric constraints obtained from the metadata. Specifically, we obtain the orientation of the ground plane normal, the orientation of the shadows cast by out of plane objects in the scene, and the relationship between object heights and the size of their corresponding shadows. We utilize the above information in a geometry-based shadow and ground plane normal blob detector, which provides an initial estimation for the locations of shadow casting out of plane (SCOOP) objects in the scene. These SCOOP candidate locations are then classified as either human or clutter using a combination of wavelet features, and a Support Vector Machine. Additionally, we combine regular SCOOP and inverted SCOOP candidates to obtain vehicle candidates. We show impressive results on sequences from VIVID and CLIF datasets, and provide comparative quantitative and qualitative analysis. We also show that we can extend the SCOOP detection method to automatically estimate the iv orientation of the shadow in the image without relying on metadata. This is useful in cases where metadata is either unavailable or erroneous. Simply detecting objects in every frame does not provide sufficient understanding of the nature of their existence in the scene. It may be necessary to know how the objects have travelled through the scene over time and which areas they have visited. Hence, there is a need to maintain the identities of the objects across different time instances. The task of object tracking can be very challenging in videos that have low frame rate, high density, and a very large number of objects, as is the case in the WAAS data. Therefore, we propose a novel method for tracking a large number of densely moving objects in an aerial video. In order to keep the complexity of the tracking problem manageable when dealing with a large number of objects, we divide the scene into grid cells, solve the tracking problem optimally within each cell using bipartite graph matching and then link the tracks across the cells. Besides tractability, grid cells also allow us to define a set of local scene constraints, such as road orientation and object context. We use these constraints as part of cost function to solve the tracking problem; This allows us to track fast-moving objects in low frame rate videos. In addition to moving through the scene, the humans that are present may be performing individual actions that should be detected and recognized by the system. A number of different approaches exist for action recognition in both aerial and ground level video. One of the requirements for the majority of these approaches is the existence of a sizeable dataset of examples of a particular action from which a model of the action can be constructed. Such a luxury is not always possible in aerial scenarios since it may be difficult to fly a large number of missions to observe a particular event multiple times. Therefore, we propose a method for v recognizing human actions in aerial video from as few examples as possible (a single example in the extreme case). We use the bag of words action representation and a 1vsAll multi-class classification framework. We assume that most of the classes have many examples, and construct Support Vector Machine models for each class. Then, we use Support Vector Machines that were trained for classes with many examples to improve the decision function of the Support Vector Machine that was trained using few examples, via late weighted fusion of decision values

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Multi-Modal Enhancement Techniques for Visibility Improvement of Digital Images

    Get PDF
    Image enhancement techniques for visibility improvement of 8-bit color digital images based on spatial domain, wavelet transform domain, and multiple image fusion approaches are investigated in this dissertation research. In the category of spatial domain approach, two enhancement algorithms are developed to deal with problems associated with images captured from scenes with high dynamic ranges. The first technique is based on an illuminance-reflectance (I-R) model of the scene irradiance. The dynamic range compression of the input image is achieved by a nonlinear transformation of the estimated illuminance based on a windowed inverse sigmoid transfer function. A single-scale neighborhood dependent contrast enhancement process is proposed to enhance the high frequency components of the illuminance, which compensates for the contrast degradation of the mid-tone frequency components caused by dynamic range compression. The intensity image obtained by integrating the enhanced illuminance and the extracted reflectance is then converted to a RGB color image through linear color restoration utilizing the color components of the original image. The second technique, named AINDANE, is a two step approach comprised of adaptive luminance enhancement and adaptive contrast enhancement. An image dependent nonlinear transfer function is designed for dynamic range compression and a multiscale image dependent neighborhood approach is developed for contrast enhancement. Real time processing of video streams is realized with the I-R model based technique due to its high speed processing capability while AINDANE produces higher quality enhanced images due to its multi-scale contrast enhancement property. Both the algorithms exhibit balanced luminance, contrast enhancement, higher robustness, and better color consistency when compared with conventional techniques. In the transform domain approach, wavelet transform based image denoising and contrast enhancement algorithms are developed. The denoising is treated as a maximum a posteriori (MAP) estimator problem; a Bivariate probability density function model is introduced to explore the interlevel dependency among the wavelet coefficients. In addition, an approximate solution to the MAP estimation problem is proposed to avoid the use of complex iterative computations to find a numerical solution. This relatively low complexity image denoising algorithm implemented with dual-tree complex wavelet transform (DT-CWT) produces high quality denoised images

    Fusion-based impairment modelling for an intelligent radar sensor architecture

    Get PDF
    An intelligent radar sensor concept has been developed using a modelling approach for prediction of sensor performance, based on application of sensor and environment models. Land clutter significantly impacts on the operation of radar sensors operating at low-grazing angles. The clutter modelling technique developed in this thesis for the prediction of land clutter forms the clutter model for the intelligent radar sensor. Fusion of remote sensing data is integral to the clutter modelling approach and is addressed by considering fusion of radar remote sensing data, and mitigation of speckle noise and data transmission impairments. The advantages of the intelligent sensor approach for predicting radar performance are demonstrated for several applications using measured data. The problem of predicting site-specific land radar performance is an important task which is complicated by the peculiarities and characteristics of the radar sensor, electromagnetic wave propagation, and the environment in which the radar is deployed. Airborne remote sensing data can provide information about the environment and terrain, which can be used to more accurately predict land radar performance. This thesis investigates how fusion of remote sensing data can be used in conjunction with a sensor modelling approach to enable site-specific prediction of land radar performance. The application of a radar sensor model and a priori information about the environment, gives rise to the notion of an intelligent radar sensor which can adapt to dynamically changing environments through intelligent processing of this a priori knowledge. This thesis advances the field of intelligent radar sensor design, through an approach based on fusion of a priori knowledge provided by remote sensing data, and application of a modelling approach to enable prediction of radar sensor performance. Original contributions are made in the areas of intelligent radar sensor development, improved estimation of land surface clutter intensity for site-specific low-grazing angle radar, and fusion and mitigation of sensor and data transmission impairments in radar remote sensing data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Biometric iris image segmentation and feature extraction for iris recognition

    Get PDF
    PhD ThesisThe continued threat to security in our interconnected world today begs for urgent solution. Iris biometric like many other biometric systems provides an alternative solution to this lingering problem. Although, iris recognition have been extensively studied, it is nevertheless, not a fully solved problem which is the factor inhibiting its implementation in real world situations today. There exists three main problems facing the existing iris recognition systems: 1) lack of robustness of the algorithm to handle non-ideal iris images, 2) slow speed of the algorithm and 3) the applicability to the existing systems in real world situation. In this thesis, six novel approaches were derived and implemented to address these current limitation of existing iris recognition systems. A novel fast and accurate segmentation approach based on the combination of graph-cut optimization and active contour model is proposed to define the irregular boundaries of the iris in a hierarchical 2-level approach. In the first hierarchy, the approximate boundary of the pupil/iris is estimated using a method based on Hough’s transform for the pupil and adapted starburst algorithm for the iris. Subsequently, in the second hierarchy, the final irregular boundary of the pupil/iris is refined and segmented using graph-cut based active contour (GCBAC) model proposed in this work. The segmentation is performed in two levels, whereby the pupil is segmented first before the iris. In order to detect and eliminate noise and reflection artefacts which might introduce errors to the algorithm, a preprocessing technique based on adaptive weighted edge detection and high-pass filtering is used to detect reflections on the high intensity areas of the image while exemplar based image inpainting is used to eliminate the reflections. After the segmentation of the iris boundaries, a post-processing operation based on combination of block classification method and statistical prediction approach is used to detect any super-imposed occluding eyelashes/eyeshadows. The normalization of the iris image is achieved though the rubber sheet model. In the second stage, an approach based on construction of complex wavelet filters and rotation of the filters to the direction of the principal texture direction is used for the extraction of important iris information while a modified particle swam optimization (PSO) is used to select the most prominent iris features for iris encoding. Classification of the iriscode is performed using adaptive support vector machines (ASVM). Experimental results demonstrate that the proposed approach achieves accuracy of 98.99% and is computationally about 2 times faster than the best existing approach.Ebonyi State University and Education Task Fund, Nigeri
    corecore