107 research outputs found
EFFICIENT IMAGE COMPRESSION AND DECOMPRESSION ALGORITHMS FOR OCR SYSTEMS
This paper presents an efficient new image compression and decompression methods for document images, intended for usage in the pre-processing stage of an OCR system designed for needs of the “Nikola Tesla Museum” in Belgrade. Proposed image compression methods exploit the Run-Length Encoding (RLE) algorithm and an algorithm based on document character contour extraction, while an iterative scanline fill algorithm is used for image decompression. Image compression and decompression methods are compared with JBIG2 and JPEG2000 image compression standards. Segmentation accuracy results for ground-truth documents are obtained in order to evaluate the proposed methods. Results show that the proposed methods outperform JBIG2 compression regarding the time complexity, providing up to 25 times lower processing time at the expense of worse compression ratio results, as well as JPEG2000 image compression standard, providing up to 4-fold improvement in compression ratio. Finally, time complexity results show that the presented methods are sufficiently fast for a real time character segmentation system
OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model
In today's technological era, document images play an important and integral
part in our day to day life, and specifically with the surge of Covid-19,
digitally scanned documents have become key source of communication, thus
avoiding any sort of infection through physical contact. Storage and
transmission of scanned document images is a very memory intensive task, hence
compression techniques are being used to reduce the image size before archival
and transmission. To extract information or to operate on the compressed
images, we have two ways of doing it. The first way is to decompress the image
and operate on it and subsequently compress it again for the efficiency of
storage and transmission. The other way is to use the characteristics of the
underlying compression algorithm to directly process the images in their
compressed form without involving decompression and re-compression. In this
paper, we propose a novel idea of developing an OCR for CCITT (The
International Telegraph and Telephone Consultative Committee) compressed
machine printed TIFF document images directly in the compressed domain. After
segmenting text regions into lines and words, HMM is applied for recognition
using three coding modes of CCITT- horizontal, vertical and the pass mode.
Experimental results show that OCR on pass modes give a promising results.Comment: The paper has 14 figures and 1 tabl
Application of Stochastic Diffusion for Hiding High Fidelity Encrypted Images
Cryptography coupled with information hiding has received increased attention in recent years and has become a major research theme because of the importance of protecting encrypted information in any Electronic Data Interchange system in a way that is both discrete and covert. One of the essential limitations in any cryptography system is that the encrypted data provides an indication on its importance which arouses suspicion and makes it vulnerable to attack. Information hiding of Steganography provides a potential solution to this issue by making the data imperceptible, the security of the hidden information being a threat only if its existence is detected through Steganalysis. This paper focuses on a study methods for hiding encrypted information, specifically, methods that encrypt data before embedding in host data where the ‘data’ is in the form of a full colour digital image. Such methods provide a greater level of data security especially when the information is to be submitted over the Internet, for example, since a potential attacker needs to first detect, then extract and then decrypt the embedded data in order to recover the original information.
After providing an extensive survey of the current methods available, we present a new method of encrypting and then hiding full colour images in three full colour host images with out loss of fidelity following data extraction and decryption. The application of this technique, which is based on a technique called ‘Stochastic Diffusion’ are wide ranging and include covert image information interchange, digital image authentication, video authentication, copyright protection and digital rights management of image data in general
MINHLP: Module to Identify New Hampshire License Plates
A license plate, referred to simply as a plate or vehicle registration plate, is a small plastic or metal plate attached to a motor vehicle for official identification purposes. Most governments require a registration plate to be attached to both the front and rear of a vehicle, although certain jurisdictions or vehicle types, such as motorcycles, require only one plate, which is usually attached to the rear of the vehicle.
We present analysis of Automatic License Plate Recognition (ALPR) of New Hampshire (NH) plates using open source products. This thesis contains an implementation of a demonstrated model and analysis of the results. In this paper, OpenCV (computer vision library) and Tesseract (open source optical character reader) is presented as a core intelligent infrastructure. The thesis explains the mathematical principles and algorithms used for number plate detection, processes of proper characters segmentation, normalization and recognition. A description of the challenges involved in detecting and reading license plate in NH, previous studies done by others and the strategies adopted to solve them is also given
Recommended from our members
Secure digital documents using Steganography and QR Code
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonWith the increasing use of the Internet several problems have arisen regarding the processing of electronic documents. These include content filtering, content retrieval/search. Moreover, document security has taken a centre stage including copyright protection, broadcast monitoring etc. There is an acute need of an effective tool which can find the identity, location and the time when the document was created so that it can be determined whether or not the contents of the document were tampered with after creation. Owing the sensitivity of the large amounts of data which is processed on a daily basis, verifying the authenticity and integrity of a document is more important now than it ever was. Unsurprisingly document authenticity verification has become the centre of attention in the world of research. Consequently, this research is concerned with creating a tool which deals with the above problem. This research proposes the use of a Quick Response Code as a message carrier for Text Key-print. The Text Key-print is a novel method which employs the basic element of the language (i.e. Characters of the alphabet) in order to achieve authenticity of electronic documents through the transformation of its physical structure into a logical structured relationship. The resultant dimensional matrix is then converted into a binary stream and encapsulated with a serial number or URL inside a Quick response Code (QR code) to form a digital fingerprint mark. For hiding a QR code, two image steganography techniques were developed based upon the spatial and the transform domains. In the spatial domain, three methods were proposed and implemented based on the least significant bit insertion technique and the use of pseudorandom number generator to scatter the message into a set of arbitrary pixels. These methods utilise the three colour channels in the images based on the RGB model based in order to embed one, two or three bits per the eight bit channel which results in three different hiding capacities. The second technique is an adaptive approach in transforming domain where a threshold value is calculated under a predefined location for embedding in order to identify the embedding strength of the embedding technique. The quality of the generated stego images was evaluated using both objective (PSNR) and Subjective (DSCQS) methods to ensure the reliability of our proposed methods. The experimental results revealed that PSNR is not a strong indicator of the perceived stego image quality, but not a bad interpreter also of the actual quality of stego images. Since the visual difference between the cover and the stego image must be absolutely imperceptible to the human visual system, it was logically convenient to ask human observers with different qualifications and experience in the field of image processing to evaluate the perceived quality of the cover and the stego image. Thus, the subjective responses were analysed using statistical measurements to describe the distribution of the scores given by the assessors. Thus, the proposed scheme presents an alternative approach to protect digital documents rather than the traditional techniques of digital signature and watermarking
Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods
This Special Issue is a book composed by collecting documents published through peer review on the research of various advanced technologies related to applications and theories of signal processing for multimedia systems using ML or advanced methods. Multimedia signals include image, video, audio, character recognition and optimization of communication channels for networks. The specific contents included in this book are data hiding, encryption, object detection, image classification, and character recognition. Academics and colleagues who are interested in these topics will find it interesting to read
Text detection and recognition in images and video sequences
Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences. Both empirical image processing methods and statistical machine learning and modeling approaches are studied in two sub-problems: text detection and text recognition. Applying machine learning methods for text detection encounters difficulties due to character size, grayscale variations and heavy computation cost. To overcome these problems, we propose a two-step localization/verification approach. The first step aims at quickly localizing candidate text lines, enabling the normalization of characters into a unique size. In the verification step, a trained support vector machine or multi-layer perceptrons is applied on background independent features to remove the false alarms. Text recognition, even from the detected text lines, remains a challenging problem due to the variety of fonts, colors, the presence of complex backgrounds and the short length of the text strings. Two schemes are investigated addressing the text recognition problem: bi-modal enhancement scheme and multi-modal segmentation scheme. In the bi-modal scheme, we propose a set of filters to enhance the contrast of black and white characters and produce a better binarization before recognition. For more general cases, the text recognition is addressed by a text segmentation step followed by a traditional optical character recognition (OCR) algorithm within a multi-hypotheses framework. In the segmentation step, we model the distribution of grayscale values of pixels using a Gaussian mixture model or a Markov Random Field. The resulting multiple segmentation hypotheses are post-processed by a connected component analysis and a grayscale consistency constraint algorithm. Finally, they are processed by an OCR software. A selection algorithm based on language modeling and OCR statistics chooses the text result from all the produced text strings. Additionally, methods for using temporal information of video text are investigated. A Monte Carlo video text segmentation method is proposed for adapting the segmentation parameters along temporal text frames. Furthermore, a ROVER (Recognizer Output Voting Error Reduction) algorithm is studied for improving the final recognition text string by voting the characters through temporal frames
Building the big picture : enhanced resolution from coding
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (leaves 105-107).by Roger George Kermode.M.S
- …