87 research outputs found

    Plant Seed Identification

    Get PDF
    Plant seed identification is routinely performed for seed certification in seed trade, phytosanitary certification for the import and export of agricultural commodities, and regulatory monitoring, surveillance, and enforcement. Current identification is performed manually by seed analysts with limited aiding tools. Extensive expertise and time is required, especially for small, morphologically similar seeds. Computers are, however, especially good at recognizing subtle differences that humans find difficult to perceive. In this thesis, a 2D, image-based computer-assisted approach is proposed. The size of plant seeds is extremely small compared with daily objects. The microscopic images of plant seeds are usually degraded by defocus blur due to the high magnification of the imaging equipment. It is necessary and beneficial to differentiate the in-focus and blurred regions given that only sharp regions carry distinctive information usually for identification. If the object of interest, the plant seed in this case, is in- focus under a single image frame, the amount of defocus blur can be employed as a cue to separate the object and the cluttered background. If the defocus blur is too strong to obscure the object itself, sharp regions of multiple image frames acquired at different focal distance can be merged together to make an all-in-focus image. This thesis describes a novel non-reference sharpness metric which exploits the distribution difference of uniform LBP patterns in blurred and non-blurred image regions. It runs in realtime on a single core cpu and responses much better on low contrast sharp regions than the competitor metrics. Its benefits are shown both in defocus segmentation and focal stacking. With the obtained all-in-focus seed image, a scale-wise pooling method is proposed to construct its feature representation. Since the imaging settings in lab testing are well constrained, the seed objects in the acquired image can be assumed to have measureable scale and controllable scale variance. The proposed method utilizes real pixel scale information and allows for accurate comparison of seeds across scales. By cross-validation on our high quality seed image dataset, better identification rate (95%) was achieved compared with pre- trained convolutional-neural-network-based models (93.6%). It offers an alternative method for image based identification with all-in-focus object images of limited scale variance. The very first digital seed identification tool of its kind was built and deployed for test in the seed laboratory of Canadian food inspection agency (CFIA). The proposed focal stacking algorithm was employed to create all-in-focus images, whereas scale-wise pooling feature representation was used as the image signature. Throughput, workload, and identification rate were evaluated and seed analysts reported significantly lower mental demand (p = 0.00245) when using the provided tool compared with manual identification. Although the identification rate in practical test is only around 50%, I have demonstrated common mistakes that have been made in the imaging process and possible ways to deploy the tool to improve the recognition rate

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Nonlinear Adaptive Diffusion Models for Image Denoising

    Full text link
    Most of digital image applications demand on high image quality. Unfortunately, images often are degraded by noise during the formation, transmission, and recording processes. Hence, image denoising is an essential processing step preceding visual and automated analyses. Image denoising methods can reduce image contrast, create block or ring artifacts in the process of denoising. In this dissertation, we develop high performance non-linear diffusion based image denoising methods, capable to preserve edges and maintain high visual quality. This is attained by different approaches: First, a nonlinear diffusion is presented with robust M-estimators as diffusivity functions. Secondly, the knowledge of textons derived from Local Binary Patterns (LBP) which unify divergent statistical and structural models of the region analysis is utilized to adjust the time step of diffusion process. Next, the role of nonlinear diffusion which is adaptive to the local context in the wavelet domain is investigated, and the stationary wavelet context based diffusion (SWCD) is developed for performing the iterative shrinkage. Finally, we develop a locally- and feature-adaptive diffusion (LFAD) method, where each image patch/region is diffused individually, and the diffusivity function is modified to incorporate the Inverse Difference Moment as a local estimate of the gradient. Experiments have been conducted to evaluate the performance of each of the developed method and compare it to the reference group and to the state-of-the-art methods

    Automated Resolution Selection for Image Segmentation

    Get PDF
    It is well known in image processing in general, and hence in image segmentation in particular, that computational cost increases rapidly with the number and dimensions of the images to be processed. Several fields, such as astronomy, remote sensing, and medical imaging, use very large images, which might also be 3D and/or captured at several frequency bands, all adding to the computational expense. Multiresolution analysis is one method of increasing the efficiency of the segmentation process. One multiresolution approach is the coarse-to-fine segmentation strategy, whereby the segmentation starts at a coarse resolution and is then fine-tuned during subsequent steps. Until now, the starting resolution for segmentation has been selected arbitrarily with no clear selection criteria. The research conducted for this thesis showed that starting from different resolutions for image segmentation results in different accuracies and speeds, even for images from the same dataset. An automated method for resolution selection for an input image would thus be beneficial. This thesis introduces a framework for the selection of the best resolution for image segmentation. First proposed is a measure for defining the best resolution based on user/system criteria, which offers a trade-off between accuracy and time. A learning approach is then described for the selection of the resolution, whereby extracted image features are mapped to the previously determined best resolution. In the learning process, class (i.e., resolution) distribution is imbalanced, making effective learning from the data difficult. A variant of AdaBoost, called RAMOBoost, is therefore used in this research for the learning-based selection of the best resolution for image segmentation. RAMOBoost is designed specifically for learning from imbalanced data. Two sets of features are used: Local Binary Patterns (LBP) and statistical features. Experiments conducted with four datasets using three different segmentation algorithms show that the resolutions selected through learning enable much faster segmentation than the original ones, while retaining at least the original accuracy. For three of the four datasets used, the segmentation results obtained with the proposed framework were significantly better than with the original resolution with respect to both accuracy and time

    Human action recognition using spatial-temporal analysis.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.In the past few decades’ human action recognition (HAR) from video has gained a lot of attention in the computer vision domain. The analysis of human activities in videos span a variety of applications including security and surveillance, entertainment, and the monitoring of the elderly. The task of recognizing human actions in any scenario is a difficult and complex one which is characterized by challenges such as self-occlusion, noisy backgrounds and variations in illumination. However, literature provides various techniques and approaches for action recognition which deal with these challenges. This dissertation focuses on a holistic approach to the human action recognition problem with specific emphasis on spatial-temporal analysis. Spatial-temporal analysis is achieved by using the Motion History Image (MHI) approach to solve the human action recognition problem. Three variants of MHI are investigated, these are: Original MHI, Modified MHI and Timed MHI. An MHI is a single image describing a silhouettes motion over a period of time. Brighter pixels in the resultant MHI show the most recent movement/motion. One of the key problems of MHI is that it is not easy to know the conditions needed to obtain an MHI silhouette that will result in a high recognition rate for action recognition. These conditions are often neglected and thus pose a problem for human action recognition systems as they could affect their overall performance. Two methods are proposed to solve the human action recognition problem and to show the conditions needed to obtain high recognition rates using the MHI approach. The first uses the concept of MHI with the Bag of Visual Words (BOVW) approach to recognize human actions. The second approach combines MHI with Local Binary Patterns (LBP). The Weizmann and KTH datasets are then used to validate the proposed methods. Results from experiments show promising recognition rates when compared to some existing methods. The BOVW approach used in combination with the three variants of MHI achieved the highest recognition rates compared to the LBP method. The original MHI method resulted in the highest recognition rate of 87% on the Weizmann dataset and an 81.6% recognition rate is achieved on the KTH dataset using the Modified MHI approach

    High Performance Video Stream Analytics System for Object Detection and Classification

    Get PDF
    Due to the recent advances in cameras, cell phones and camcorders, particularly the resolution at which they can record an image/video, large amounts of data are generated daily. This video data is often so large that manually inspecting it for object detection and classification can be time consuming and error prone, thereby it requires automated analysis to extract useful information and meta-data. The automated analysis from video streams also comes with numerous challenges such as blur content and variation in illumination conditions and poses. We investigate an automated video analytics system in this thesis which takes into account the characteristics from both shallow and deep learning domains. We propose fusion of features from spatial frequency domain to perform highly accurate blur and illumination invariant object classification using deep learning networks. We also propose the tuning of hyper-parameters associated with the deep learning network through a mathematical model. The mathematical model used to support hyper-parameter tuning improved the performance of the proposed system during training. The outcomes of various hyper-parameters on system's performance are compared. The parameters that contribute towards the most optimal performance are selected for the video object classification. The proposed video analytics system has been demonstrated to process a large number of video streams and the underlying infrastructure is able to scale based on the number and size of the video stream(s) being processed. The extensive experimentation on publicly available image and video datasets reveal that the proposed system is significantly more accurate and scalable and can be used as a general purpose video analytics system.N/

    Content-based retrieval of visual information

    Get PDF
    In this dissertation, I investigate new approaches relevant to content-based image retrieval techniques. First, the MOD paradigm is proposed, a method for detecting salient points in images. These salient points are specifically designed to enhance image retrieval accuracy by maximizing distinctiveness. Second, the multi-dimensional maximum likelihood similarity measure is presented, which removes a critical limitation in prior research in this area and provides an improved method of comparing image features. Third, a texture classification method based on low dimensional constructed texture features is introduced which have very low computational complexity and would be suitable for real time video understanding or interactive search of very large image databases. The new approaches are tested on well respected international test sets containing representative imagery.UBL - phd migration 201
    • …
    corecore