92 research outputs found

    Sealing Clay Text Segmentation Based on Radon-Like Features and Adaptive Enhancement Filters

    Get PDF
    Text extraction is a key issue in sealing clay research. The traditional method based on rubbings increases the risk of sealing clay damage and is unfavorable to sealing clay protection. Therefore, using digital image of sealing clay, a new method for text segmentation based on Radon-like features and adaptive enhancement filters is proposed in this paper. First, adaptive enhancement LM filter bank is used to get the maximum energy image; second, the edge image of the maximum energy image is calculated; finally, Radon-like feature images are generated by combining maximum energy image and its edge image. The average image of Radon-like feature images is segmented by the image thresholding method. Compared with 2D Otsu, GA, and FastFCM, the experiment result shows that this method can perform better in terms of accuracy and completeness of the text

    Edges Detection Based On Renyi Entropy with Split/Merge

    Get PDF
    Most of the classical methods for edge detection are based on the first and second order derivatives of gray levels of the pixels of the original image. These processes give rise to the exponential increment of computational time, especially with large size of images, and therefore requires more time for processing. This paper shows the new algorithm based on both the Rényi entropy and the Shannon entropy together for edge detection using split and merge technique. The objective is to find the best edge representation and decrease the computation time. A set of experiments in the domain of edge detection are presented. The system yields edge detection performance comparable to the classic methods, such as Canny, LOG, and Sobel.  The experimental results show that the effect of this method is better to LOG, and Sobel methods. In addition, it is better to other three methods in CPU time. Another benefit comes from easy implementation of this method. Keywords: Rényi Entropy, Information content, Edge detection, Thresholdin

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Text detection and recognition in images and video sequences

    Get PDF
    Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences. Both empirical image processing methods and statistical machine learning and modeling approaches are studied in two sub-problems: text detection and text recognition. Applying machine learning methods for text detection encounters difficulties due to character size, grayscale variations and heavy computation cost. To overcome these problems, we propose a two-step localization/verification approach. The first step aims at quickly localizing candidate text lines, enabling the normalization of characters into a unique size. In the verification step, a trained support vector machine or multi-layer perceptrons is applied on background independent features to remove the false alarms. Text recognition, even from the detected text lines, remains a challenging problem due to the variety of fonts, colors, the presence of complex backgrounds and the short length of the text strings. Two schemes are investigated addressing the text recognition problem: bi-modal enhancement scheme and multi-modal segmentation scheme. In the bi-modal scheme, we propose a set of filters to enhance the contrast of black and white characters and produce a better binarization before recognition. For more general cases, the text recognition is addressed by a text segmentation step followed by a traditional optical character recognition (OCR) algorithm within a multi-hypotheses framework. In the segmentation step, we model the distribution of grayscale values of pixels using a Gaussian mixture model or a Markov Random Field. The resulting multiple segmentation hypotheses are post-processed by a connected component analysis and a grayscale consistency constraint algorithm. Finally, they are processed by an OCR software. A selection algorithm based on language modeling and OCR statistics chooses the text result from all the produced text strings. Additionally, methods for using temporal information of video text are investigated. A Monte Carlo video text segmentation method is proposed for adapting the segmentation parameters along temporal text frames. Furthermore, a ROVER (Recognizer Output Voting Error Reduction) algorithm is studied for improving the final recognition text string by voting the characters through temporal frames

    Sealing Clay Text Segmentation Based on Radon-Like Features and Adaptive Enhancement Filters

    Get PDF
    Text extraction is a key issue in sealing clay research. The traditional method based on rubbings increases the risk of sealing clay damage and is unfavorable to sealing clay protection. Therefore, using digital image of sealing clay, a new method for text segmentation based on Radon-like features and adaptive enhancement filters is proposed in this paper. First, adaptive enhancement LM filter bank is used to get the maximum energy image; second, the edge image of the maximum energy image is calculated; finally, Radon-like feature images are generated by combining maximum energy image and its edge image. The average image of Radon-like feature images is segmented by the image thresholding method. Compared with 2D Otsu, GA, and FastFCM, the experiment result shows that this method can perform better in terms of accuracy and completeness of the text

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

    Get PDF
    Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

    Human-Centric Machine Vision

    Get PDF
    Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans

    SEMANTIC IMAGE SEGMENTATION VIA A DENSE PARALLEL NETWORK

    Get PDF
    Image segmentation has been an important area of study in computer vision. Image segmentation is a challenging task, since it involves pixel-wise annotation, i.e. labeling each pixel according to the class to which it belongs. In image classification task, the goal is to predict to which class an entire image belongs. Thus, there is more focus on the abstract features extracted by Convolutional Neural Networks (CNNs), with less emphasis on the spatial information. In image segmentation task, on the other hand, the abstract information and spatial information are needed at the same time. One class of work in image segmentation focuses on ``recovering” the high-resolution features from the low resolution ones. This type of network has an encoder-decoder structure, and spatial information is recovered by feeding the decoder part of the model with previous high-resolution features through skip connections. Overall, these strategies involving skip connections try to propagate features to deeper layers. The second class of work, on the other hand, focuses on ``maintaining high resolution features throughout the process. In this thesis, we first review the related work on image segmentation and then introduce two new models, namely Unet-Laplacian and Dense Parallel Network (DensePN). The Unet-Laplacian is a series CNN model, incorporating a Laplacian filter branch. This new branch performs Laplacian filter operation on the input RGB image, and feeds the output to the decoder. Experiments results show that, the output of the Unet-Laplacian captures more of the ground truth mask, and eliminates some of the false positives. We then describe the proposed DensePN, which was designed to find a good balance between extracting features through multiple layers and keeping spatial information. DensePN allows not only keeping high-resolution feature maps but also feature reuse at deeper layers to solve the image segmentation problem. We have designed the Dense Parallel Network based on three main observations that we have gained from our initial trials and preliminary studies. First, maintaining a high resolution feature map provides good performance. Second, feature reuse is very efficient, and allows having deeper networks. Third, having a parallel structure can provide better information flow. Experimental results on the CamVid dataset show that the proposed DensePN (with 1.1M parameters) provides a better performance than FCDense56 (with 1.5M parameters) by having less parameters at the same time
    • 

    corecore