117 research outputs found

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    Vehicle make and model recognition in CCTV footage

    Get PDF
    This paper presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD (coherent Point Drift) is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR (Vehicle Make and Model Recognition) task and may capture vehicles at different approaching angles. Also a novel ROI (Region Of Interest) segmentation is proposed. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximize the reliability of the fnal outcome. Experimental results are provided to prove that the proposed system demonstrates accuracy over 95% when tested in real CCTV footage with no prior camera calibration

    Advances in Multi-Sensor Data Fusion: Algorithms and Applications

    Get PDF
    With the development of satellite and remote sensing techniques, more and more image data from airborne/satellite sensors have become available. Multi-sensor image fusion seeks to combine information from different images to obtain more inferences than can be derived from a single sensor. In image-based application fields, image fusion has emerged as a promising research area since the end of the last century. The paper presents an overview of recent advances in multi-sensor satellite image fusion. Firstly, the most popular existing fusion algorithms are introduced, with emphasis on their recent improvements. Advances in main applications fields in remote sensing, including object identification, classification, change detection and maneuvering targets tracking, are described. Both advantages and limitations of those applications are then discussed. Recommendations are addressed, including: (1) Improvements of fusion algorithms; (2) Development of “algorithm fusion” methods; (3) Establishment of an automatic quality assessment scheme

    Human object annotation for surveillance video forensics

    Get PDF
    A system that can automatically annotate surveillance video in a manner useful for locating a person with a given description of clothing is presented. Each human is annotated based on two appearance features: primary colors of clothes and the presence of text/logos on clothes. The annotation occurs after a robust foreground extraction stage employing a modified Gaussian mixture model-based approach. The proposed pipeline consists of a preprocessing stage where color appearance of an image is improved using a color constancy algorithm. In order to annotate color information for human clothes, we use the color histogram feature in HSV space and find local maxima to extract dominant colors for different parts of a segmented human object. To detect text/logos on clothes, we begin with the extraction of connected components of enhanced horizontal, vertical, and diagonal edges in the frames. These candidate regions are classified as text or nontext on the basis of their local energy-based shape histogram features. Further, to detect humans, a novel technique has been proposed that uses contourlet transform-based local binary pattern (CLBP) features. In the proposed method, we extract the uniform direction invariant LBP feature descriptor for contourlet transformed high-pass subimages from vertical and diagonal directional bands. In the final stage, extracted CLBP descriptors are classified by a trained support vector machine. Experimental results illustrate the superiority of our method on large-scale surveillance video data

    Car make and model recognition under limited lighting conditions at night

    Get PDF
    Car make and model recognition (CMMR) has become an important part of intelligent transport systems. Information provided by CMMR can be utilized when license plate numbers cannot be identified or fake number plates are used. CMMR can also be used when a certain model of a vehicle is required to be automatically identified by cameras. The majority of existing CMMR methods are designed to be used only in daytime when most of the car features can be easily seen. Few methods have been developed to cope with limited lighting conditions at night where many vehicle features cannot be detected. The aim of this work was to identify car make and model at night by using available rear view features. This paper presents a one-class classifier ensemble designed to identify a particular car model of interest from other models. The combination of salient geographical and shape features of taillights and license plates from the rear view is extracted and used in the recognition process. The majority vote from support vector machine, decision tree, and k-nearest neighbors is applied to verify a target model in the classification process. The experiments on 421 car makes and models captured under limited lighting conditions at night show the classification accuracy rate at about 93 %

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Driver Distraction Identification with an Ensemble of Convolutional Neural Networks

    Get PDF
    The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0949

    An Efficient Method for Traffic Image Denoising

    Get PDF
    AbstractIn this paper, a novel method for traffic image denoising based on the low-rank decomposition is proposed. Firstly, the low-rank decomposition is carried out. Under the sparse and low-rank constraints of low-rank decomposition, the foreground images with complanate background and moving vehicles and the background images with similar road scene are obtained. Then the foreground image is segmented into blocks of a certain size. The variance of each block is calculated, among that the minimum is considered the estimate of the noise power. KSVD algorithm is performed for the foreground image denoising. Furthermore, the noisy pixel discrimination algorithm is performed to distinguish the noisy pixels from the noiseless pixels and the eight- neighborhood weight interpolation algorithm is performed to reconstruct the noisy pixels, where the weighted coefficients are inversely proportional to the Euclidean distances between the pixels. And PCA recovery combined with noisy pixel discrimination and eight-neighborhood weight interpolation is adopted for the background image denoising. Finally, our proposed method is conducted based on the traffic videos obtained under the same view and angle. Moreover, our proposed method is compared with several state-of-the-art denoising methods including BM3D, KSVD and PCA recovery. The experiment results illustrate that our proposed method can more effectively remove the noise, preserve the useful information and achieve a better performance in terms of both PSNR index and visual qualities
    • …
    corecore