1,724 research outputs found
Design and Implementation Recognition System for Handwritten Hindi/Marathi Document
In the present scenario most of the importance is given for the “paperless office” there by more and more communication and storage of documents is performed digitally. Documents and files which are present in Hindi and Marathi languages that were once stored physically on paper are now being converted into electronic form in order to facilitate quicker additions, searches, and modifications, as well as to prolong the life of such records. Because of this, there is a great demand of such software, which automatically extracts, analyze, recognize and store information from physical documents for later retrieval. Skew detection is used for text line position determination in Digitized documents, automated page orientation, and skew angle detection for binary document images, skew detection in handwritten scripts, in compensation for Internet audio applications and in the correction of scanned documents
Recommended from our members
Use of colour for hand-filled form analysis and recognition
Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system
BN-DRISHTI: Bangla Document Recognition through Instance-level Segmentation of Handwritten Text Images
Handwriting recognition remains challenging for some of the most spoken
languages, like Bangla, due to the complexity of line and word segmentation
brought by the curvilinear nature of writing and lack of quality datasets. This
paper solves the segmentation problem by introducing a state-of-the-art method
(BN-DRISHTI) that combines a deep learning-based object detection framework
(YOLO) with Hough and Affine transformation for skew correction. However,
training deep learning models requires a massive amount of data. Thus, we also
present an extended version of the BN-HTRd dataset comprising 786 full-page
handwritten Bangla document images, line and word-level annotation for
segmentation, and corresponding ground truths for word recognition. Evaluation
on the test portion of our dataset resulted in an F-score of 99.97% for line
and 98% for word segmentation. For comparative analysis, we used three external
Bangla handwritten datasets, namely BanglaWriting, WBSUBNdb_text, and ICDAR
2013, where our system outperformed by a significant margin, further justifying
the performance of our approach on completely unseen samples.Comment: Will be published under the Springer Springer Lecture Notes in
Computer Science (LNCS) series, as part of ICDAR WML 202
A Masked Bounding-Box Selection Based ResNet Predictor for Text Rotation Prediction
The existing Optical Character Recognition (OCR) systems are capable of
recognizing images with horizontal texts. However, when the rotation of the
texts increases, it becomes harder to recognizing these texts. The performance
of the OCR systems decreases. Thus predicting the rotations of the texts and
correcting the images are important. Previous work mainly uses traditional
Computer Vision methods like Hough Transform and Deep Learning methods like
Convolutional Neural Network. However, all of these methods are prone to
background noises commonly existing in general images with texts. To tackle
this problem, in this work, we introduce a new masked bounding-box selection
method, that incorporating the bounding box information into the system. By
training a ResNet predictor to focus on the bounding box as the region of
interest (ROI), the predictor learns to overlook the background noises.
Evaluations on the text rotation prediction tasks show that our method improves
the performance by a large margin
Statistics Oriented Preprocessing of Document Image
Old printed documents represent an important part of our cultural heritage. Their digitalization plays an important role in creating data and metadata. The paper proposed an algorithm for estimation of the global text skew. First, document image is binarized reducing the impact of noise and uneven illumination. The binary image is statistically analyzed and processed. Accordingly, redundant data have been excluded. Furthermore, the convex hulls are established encircling each text object. They are joined establishing connected components. Then, the connected components in complementary image are enlarged with morphological dilation. At the end, the biggest connected component is extracted. Its orientation is similar to the global orientation of text document which is calculated by the moments. Efficiency and correctness of the algorithm are verified by testing on a custom dataset
Algorithms for document image skew estimation
A new projection profile based skew estimation algorithm was developed. This algorithm extracts fiducial points representing character elements by decoding a JBIG compressed image without reconstructing the original image. These points are projected along parallel lines into an accumulator array to determine the maximum alignment and the corresponding skew angle. Methods for characterizing the performance of skew estimation techniques were also investigated. In addition to the new skew estimator, three projection based algorithms were implemented and tested using 1,246 single column text zones extracted from a sample of 460 page images. Linear regression analyses of the experimental results indicate that our new skew estimation algorithm performs competitively with the other three techniques. These analyses also show that estimators using connected components as a fiducial representation perform worse than the others on the entire set of text zones. It is shown that all of the algorithms are sensitive to typographical features. The number of text lines in a zone significantly affects the accuracy of the connected component based methods. We also developed two aggregate measures of skew for entire pages. Experiments performed on the 460 unconstrained pages indicate the need to filter non-text features from consideration. Graphic and noise elements from page images contribute a significant amount of the error for the JBIG algorithm
- …