134,083 research outputs found

    Effect of OCR errors on short documents

    Full text link
    Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes and translates text image into ASCII format. When this data is retrieved in response to a query, the retrieval performance depends on the efficiency of the OCR device used. Measures like recall, precision and ranking were used to gauge the retrieval performance. The information retrieval system that was used is SMART, based on the vector space model. On evaluating these measures, it has been concluded that average precision and recall are not affected significantly when the OCR collection is compared to its corrected version. However, it was also concluded that with more complex weighting schemes, the relevant document rankings became more divergent. Also, the effect of an automatic post-processing system on the retrieval performance was studied

    Co-occurrence Models for Image Annotation and Retrieval

    Get PDF
    We present two models for content-based automatic image annotation and retrieval in web image repositories, based on the co-occurrence of tags and visual features in the images. In particular, we show how additional measures can be taken to address the noisy and limited tagging problems, in datasets such as Flickr, to improve performance. As in many state-of-the-art works, an image is represented as a bag of visual terms computed using edge and color information. The cooccurrence information of visual terms and tags is used to create models for image annotation and retrieval. The first model begins with a naive Bayes approach and then improves upon it by using image pairs as single documents to significantly reduce the noise and increase annotation performance. The second method models the visual terms and tags as a graph, and uses query expansion techniques to improve the retrieval performance. We evaluate our methods on the commonly used 150 concept Corel dataset, and a much harder 2000 concept Flickr dataset

    Similarity Measures for Automatic Defect Detection on Patterned Textures

    Get PDF
    Similarity measures are widely used in various applications such as information retrieval, image and object recognition, text retrieval, and web data search. In this paper, we propose similarity-based methods for defect detection on patterned textures using five different similarity measures, viz., Normalized Histogram Intersection Coefficient, Bhattacharyya Coefficient, Pearson Product-moment Correlation Coefficient, Jaccard Coefficient and Cosine-angle Coefficient. Periodic blocks are extracted from each input defective image and similarity matrix is obtained based on the similarity coefficient of histogram of each periodic block with respect to itself and other all periodic blocks. Each similarity matrix is transformed into dissimilarity matrix containing true-distance metrics and Ward’s hierarchical clustering is performed to discern between defective and defect-free blocks. Performance of the proposed method is evaluated for each similarity measure based on precision, recall and accuracy for various real fabric images with defects such as broken end, hole, thin bar, thick bar, netting multiple, knot, and missing pick

    Shape matching by curve modelling and alignment

    Get PDF
    Automatic information retrieval in the eld of shape recognition has been widely covered by many research elds. Various techniques have been developed using different approaches such as intensity-based, modelbased and shape-based methods. Whichever is the way to represent the objects in images, a recognition method should be robust in the presence of scale change, translation and rotation. In this paper we present a new recognition method based on a curve alignment technique, for planar image contours. The method consists of various phases including extracting outlines of images, detecting signicant points and aligning curves. The dominant points can be manually or automatically detected. The matching phase uses the idea of calculating the overlapping indices between shapes as similarity measures. To evaluate the effectiveness of the algorithm, two databases of 216 and 99 images have been used. A performance analysis and comparison is provided by precision-recall curves

    Tagging and Retrieving Images with Co-Occurrence Models: from Corel to Flickr

    Get PDF
    This paper presents two models for content-based automatic image annotation and retrieval in web image repositories, based on the co-occurrence of tags and visual features in the images. In particular, we show how additional measures can be taken to address the noisy and limited tagging problems, in datasets such as Flickr, to improve performance. An image is represented as a bag of visual terms computed using edge and color information. The first model begins with a naive Bayes approach and then improves upon it by using image pairs as single documents to significantly reduce the noise and increase annotation performance. The second method models the visual features and tags as a graph, and uses query expansion techniques to improve the retrieval performance. We evaluate our methods on the commonly used 150 concept Corel dataset, and a much harder 2000 concept Flickr dataset

    Video matching using DC-image and local features

    Get PDF
    This paper presents a suggested framework for video matching based on local features extracted from the DCimage of MPEG compressed videos, without decompression. The relevant arguments and supporting evidences are discussed for developing video similarity techniques that works directly on compressed videos, without decompression, and especially utilising small size images. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and the corresponding computation complexity. The second experiment compares between using local features and global features in video matching, especially in the compressed domain and with the small size images. The results confirmed that the use of DC-image, despite its highly reduced size, is promising as it produces at least similar (if not better) matching precision, compared to the full I-frame. Also, using SIFT, as a local feature, outperforms precision of most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the realtime margin. There are also various optimisations that can be done to improve this computation complexity
    • …
    corecore