579 research outputs found

    Refined Gaussian weighted histogram intersection and its application in number plate categorization

    Full text link
    This paper proposes a refined Gaussian weighted histogram intersection for content-based image matching and applies the method for number plate categorization. Number plate images are classified into two groups based on their colour similarities with the model image of each group. The similarities of images are measured by the matching rates between their colour histograms. Histogram intersection (HI) is used to calculate the matching rates of histograms. Since the conventional histogram intersection algorithm is strictly based on the matching between bins of identical colours, the final matching rate could easily be affected by colour variation caused by various environment changes. In our recent paper [9], a Gaussian weighted histogram intersection (GWHI) algorithm has been proposed to facilitate the histogram matching via taking into account matching of both identical colours and similar colours. The weight is determined by the distance between two colours. When applied to number plate categorization, the GWHI algorithm demonstrates to be more robust to colour variations and produces a classification with much lower intra-class distance and much higher interclass distance than previous HI algorithms. However, the processing speed of this GWHI method is still not satisfying. In this paper, the GWHI method is further refined, where a colour quantization method is utilized to reduce the number of colours without introducing apparent perceptual colour distortion. New experimental results demonstrate that using the refined GWHI method, image categorization can be done more efficiently. © 2006 IEEE

    A comparison on histogram based image matching methods

    Full text link
    Using colour histogram as a stable representation over change in view has been widely used for object recognition. In this paper, three newly proposed histogram-based methods are compared with other three popular methods, including conventional histogram intersection (HI) method, Wong and Cheung's merged palette histogram matching (MPHM) method, and Gevers' colour ratio gradient (CRG) method. These methods are tested on vehicle number plate images for number plate classification. Experimental results disclose that, the CRG method is the best choice in terms of speed, and the GWHI method can give the best classification results. Overall, the CECH method produces the best performance when both speed and classification performance are concerned. © 2006 IEEE

    Applying local cooccurring patterns for object detection from aerial images

    Full text link
    Developing a spatial searching tool to enhance the search car pabilities of large spatial repositories for Geographical Information System (GIS) update has attracted more and more attention. Typically, objects to be detected are represented by many local features or local parts. Testing images are processed by extracting local features which are then matched with the object's model image. Most existing work that uses local features assumes that each of the local features is independent to each other. However, in many cases, this is not true. In this paper, a method of applying the local cooccurring patterns to disclose the cooccurring relationships between local features for object detection is presented. Features including colour features and edge-based shape features of the interested object are collected. To reveal the cooccurring patterns among multiple local features, a colour cooccurrence histogram is constructed and used to search objects of interest from target images. The method is demonstrated in detecting swimming pools from aerial images. Our experimental results show the feasibility of using this method for effectively reducing the labour work in finding man-made objects of interest from aerial images. © Springer-Verlag Berlin Heidelberg 2007

    Registration and categorization of camera captured documents

    Get PDF
    Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc. However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features. A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust Point Matching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT). Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the F–measure accuracy using the later approach as compared to the former

    Similarity Measures for Automatic Defect Detection on Patterned Textures

    Get PDF
    Similarity measures are widely used in various applications such as information retrieval, image and object recognition, text retrieval, and web data search. In this paper, we propose similarity-based methods for defect detection on patterned textures using five different similarity measures, viz., Normalized Histogram Intersection Coefficient, Bhattacharyya Coefficient, Pearson Product-moment Correlation Coefficient, Jaccard Coefficient and Cosine-angle Coefficient. Periodic blocks are extracted from each input defective image and similarity matrix is obtained based on the similarity coefficient of histogram of each periodic block with respect to itself and other all periodic blocks. Each similarity matrix is transformed into dissimilarity matrix containing true-distance metrics and Ward’s hierarchical clustering is performed to discern between defective and defect-free blocks. Performance of the proposed method is evaluated for each similarity measure based on precision, recall and accuracy for various real fabric images with defects such as broken end, hole, thin bar, thick bar, netting multiple, knot, and missing pick

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Matching sets of features for efficient retrieval and recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 145-153).In numerous domains it is useful to represent a single example by the collection of local features or parts that comprise it. In computer vision in particular, local image features are a powerful way to describe images of objects and scenes. Their stability under variable image conditions is critical for success in a wide range of recognition and retrieval applications. However, many conventional similarity measures and machine learning algorithms assume vector inputs. Comparing and learning from images represented by sets of local features is therefore challenging, since each set may vary in cardinality and its elements lack a meaningful ordering. In this thesis I present computationally efficient techniques to handle comparisons, learning, and indexing with examples represented by sets of features. The primary goal of this research is to design and demonstrate algorithms that can effectively accommodate this useful representation in a way that scales with both the representation size as well as the number of images available for indexing or learning. I introduce the pyramid match algorithm, which efficiently forms an implicit partial matching between two sets of feature vectors.(cont.) The matching has a linear time complexity, naturally forms a Mercer kernel, and is robust to clutter or outlier features, a critical advantage for handling images with variable backgrounds, occlusions, and viewpoint changes. I provide bounds on the expected error relative to the optimal partial matching. For very large databases, even extremely efficient pairwise comparisons may not offer adequately responsive query times. I show how to perform sub-linear time retrievals under the matching measure with randomized hashing techniques, even when input sets have varying numbers of features. My results are focused on several important vision tasks, including applications to content-based image retrieval, discriminative classification for object recognition, kernel regression, and unsupervised learning of categories. I show how the dramatic increase in performance enables accurate and flexible image comparisons to be made on large-scale data sets, and removes the need to artificially limit the number of local descriptions used per image when learning visual categories.by Kristen Lorraine Grauman.Ph.D

    Probabilistic and geometric shape based segmentation methods.

    Get PDF
    Image segmentation is one of the most important problems in image processing, object recognition, computer vision, medical imaging, etc. In general, the objective of the segmentation is to partition the image into the meaningful areas using the existing (low level) information in the image and prior (high level) information which can be obtained using a number of features of an object. As stated in [1,2], the human vision system aims to extract and use as much information as possible in the image including but not limited to the intensity, possible motion of the object (in sequential images), spatial relations (interaction) as the existing information, and the shape of the object which is learnt from the experience as the prior information. The main objective of this dissertation is to couple the prior information with the existing information since the machine vision system cannot predict the prior information unless it is given. To label the image into meaningful areas, the chosen information is modelled to fit progressively in each of the regions by an optimization process. The intensity and spatial interaction (as the existing information) and shape (as the prior information) are modeled to obtain the optimum segmentation in this study. The intensity information is modelled using the Gaussian distribution. Spatial interaction that describes the relation between neighboring pixels/voxels is modelled by assuming that the pixel intensity depends on the intensities of the neighboring pixels. The shape model is obtained using occurrences of histogram of training shape pixels or voxels. The main objective is to capture the shape variation of the object of interest. Each pixel in the image will have three probabilities to be an object and a background class based on the intensity, spatial interaction, and shape models. These probabilistic values will guide the energy (cost) functionals in the optimization process. This dissertation proposes segmentation frameworks which has the following properties: i) original to solve some of the existing problems, ii) robust under various segmentation challenges, and iii) fast enough to be used in the real applications. In this dissertation, the models are integrated into different methods to obtain the optimum segmentation: 1) variational (can be considered as the spatially continuous), and 2) statistical (can be considered as the spatially discrete) methods. The proposed segmentation frameworks start with obtaining the initial segmentation using the intensity / spatial interaction models. The shape model, which is obtained using the training shapes, is registered to the image domain. Finally, the optimal segmentation is obtained using the optimization of the energy functionals. Experiments show that the use of the shape prior improves considerably the accuracy of the alternative methods which use only existing or both information in the image. The proposed methods are tested on the synthetic and clinical images/shapes and they are shown to be robust under various noise levels, occlusions, and missing object information. Vertebral bodies (VBs) in clinical computed tomography (CT) are segmented using the proposed methods to help the bone mineral density measurements and fracture analysis in bones. Experimental results show that the proposed solutions eliminate some of the existing problems in the VB segmentation. One of the most important contributions of this study is to offer a segmentation framework which can be suitable to the clinical works

    Learning coupled conditional random field for image decomposition : theory and application in object categorization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 171-180).The goal of this thesis is to build a computational system that is able to identify object categories within images. To this end, this thesis proposes a computational model of "recognition-through-decomposition-and-fusion" based on the psychophysical theories of information dissociation and integration in human visual perception. At the lowest level, contour and texture processes are defined and measured. In the mid-level, a novel coupled Conditional Random Field model is proposed to model and decompose the contour and texture processes in natural images. Various matching schemes are introduced to match the decomposed contour and texture channels in a dissociative manner. As a counterpart to the integrative process in the human visual system, adaptive combination is applied to fuse the perception in the decomposed contour and texture channels. The proposed coupled Conditional Random Field model is shown to be an important extension of popular single-layer Random Field models for modeling image processes, by dedicating a separate layer of random field grid to each individual image process and capturing the distinct properties of multiple visual processes. The decomposition enables the system to fully leverage each decomposed visual stimulus to its full potential in discriminating different object classes. Adaptive combination of multiple visual cues well mirrors the fact that different visual cues play different roles in distinguishing various object classes. Experimental results demonstrate that the proposed computational model of "recognition-through-decomposition-and-fusion" achieves better performance than most of the state-of-the-art methods in recognizing the objects in Caltech-101, especially when only a limited number of training samples are available, which conforms with the capability of learning to recognize a class of objects from a few sample images in the human visual system.by Xiaoxu Ma.Ph.D

    Text localization and recognition in natural scene images

    Get PDF
    Text localization and recognition (text spotting) in natural scene images is an interesting task that finds many practical applications. Algorithms for text spotting may be used in helping visually impaired subjects during navigation in unknown environments; building autonomous driving systems that automatically avoid collisions with pedestrians or automatically identify speed limits and warn the driver about possible infractions that are being committed; and to ease or solve some tedious and repetitive data entry tasks that are still manually carried out by humans. While Optical Character Recognition (OCR) from scanned documents is a solved problem, the same cannot be said for text spotting in natural images. In fact, this latest class of images contains plenty of difficult situations that algorithms for text spotting need to deal with in order to reach acceptable recognition rates. During my PhD research I focused my studies on the development of novel systems for text localization and recognition in natural scene images. The two main works that I have presented during these three years of PhD studies are presented in this thesis: (i) in my first work I propose a hybrid system which exploits the key ideas of region-based and connected components (CC)-based text localization approaches to localize uncommon fonts and writings in natural images; (ii) in my second work I describe a novel deep-based system which exploits Convolutional Neural Networks and enhanced stable CC to achieve good text spotting results on challenging data sets. During the development of both these methods, my focus has always been on maintaining an acceptable computational complexity and a high reproducibility of the achieved results
    • …
    corecore