146 research outputs found

    Arabic cursive text recognition from natural scene images

    Full text link
    © 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

    The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

    Full text link
    [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds.Castro-Bleda, MJ.; España Boquera, S.; Pastor Pellicer, J.; Zamora Martínez, FJ. (2020). The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing. The Computer Journal. 63(11):1658-1667. https://doi.org/10.1093/comjnl/bxz098S165816676311Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7Impedovo, S. (2014). More than twenty years of advancements on Frontiers in handwriting recognition. Pattern Recognition, 47(3), 916-928. doi:10.1016/j.patcog.2013.05.027Baird, H. S. (2007). The State of the Art of Document Image Degradation Modelling. Advances in Pattern Recognition, 261-279. doi:10.1007/978-1-84628-726-8_12Egmont-Petersen, M., de Ridder, D., & Handels, H. (2002). Image processing with neural networks—a review. Pattern Recognition, 35(10), 2279-2301. doi:10.1016/s0031-3203(01)00178-9Marinai, S., Gori, M., & Soda, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 23-35. doi:10.1109/tpami.2005.4Rehman, A., & Saba, T. (2012). Neural networks for document image preprocessing: state of the art. Artificial Intelligence Review, 42(2), 253-273. doi:10.1007/s10462-012-9337-zLazzara, G., & Géraud, T. (2013). Efficient multiscale Sauvola’s binarization. International Journal on Document Analysis and Recognition (IJDAR), 17(2), 105-123. doi:10.1007/s10032-013-0209-0Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground truth creation for handwriting recognition in historical documents. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS ’10. doi:10.1145/1815330.1815331Belhedi, A., & Marcotegui, B. (2016). Adaptive scene‐text binarisation on images captured by smartphones. IET Image Processing, 10(7), 515-523. doi:10.1049/iet-ipr.2015.0695Kieu, V. C., Visani, M., Journet, N., Mullot, R., & Domenger, J. P. (2013). An efficient parametrization of character degradation model for semi-synthetic image generation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501127Fischer, A., Visani, M., Kieu, V. C., & Suen, C. Y. (2013). Generation of learning samples for historical handwriting recognition using image degradation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501123Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., & Billy, A. (2017). DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging, 3(4), 62. doi:10.3390/jimaging3040062Walker, D., Lund, W., & Ringger, E. (2012). A synthetic document image dataset for developing and evaluating historical document processing methods. Document Recognition and Retrieval XIX. doi:10.1117/12.912203Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295-307. doi:10.1109/tpami.2015.2439281Suzuki, K., Horiba, I., & Sugie, N. (2003). Neural edge enhancer for supervised edge enhancement from noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1582-1596. doi:10.1109/tpami.2003.1251151Hidalgo, J. L., España, S., Castro, M. J., & Pérez, J. A. (2005). Enhancement and Cleaning of Handwritten Data by Using Neural Networks. Lecture Notes in Computer Science, 376-383. doi:10.1007/11492429_46Pastor-Pellicer, J., España-Boquera, S., Zamora-Martínez, F., Afzal, M. Z., & Castro-Bleda, M. J. (2015). Insights on the Use of Convolutional Neural Networks for Document Image Binarization. Lecture Notes in Computer Science, 115-126. doi:10.1007/978-3-319-19222-2_10España-Boquera, S., Zamora-Martínez, F., Castro-Bleda, M. J., & Gorbe-Moya, J. (s. f.). Efficient BP Algorithms for General Feedforward Neural Networks. Lecture Notes in Computer Science, 327-336. doi:10.1007/978-3-540-73053-8_33Zamora-Martínez, F., España-Boquera, S., & Castro-Bleda, M. J. (s. f.). Behaviour-Based Clustering of Neural Networks Applied to Document Enhancement. Lecture Notes in Computer Science, 144-151. doi:10.1007/978-3-540-73007-1_18Graves, A., Fernández, S., & Schmidhuber, J. (2007). Multi-dimensional Recurrent Neural Networks. Artificial Neural Networks – ICANN 2007, 549-558. doi:10.1007/978-3-540-74690-4_56Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. doi:10.1016/s0031-3203(99)00055-2Pastor-Pellicer, J., Castro-Bleda, M. J., & Adelantado-Torres, J. L. (2015). esCam: A Mobile Application to Capture and Enhance Text Images. Lecture Notes in Computer Science, 601-604. doi:10.1007/978-3-319-19222-2_5

    MicroConceptBERT: concept-relation based document information extraction framework.

    Get PDF
    Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models

    Multilingual Text Detection with Nonlinear Neural Network

    Get PDF
    Multilingual text detection in natural scenes is still a challenging task in computer vision. In this paper, we apply an unsupervised learning algorithm to learn language-independent stroke feature and combine unsupervised stroke feature learning and automatically multilayer feature extraction to improve the representational power of text feature. We also develop a novel nonlinear network based on traditional Convolutional Neural Network that is able to detect multilingual text regions in the images. The proposed method is evaluated on standard benchmarks and multilingual dataset and demonstrates improvement over the previous work

    Rectification and Super-Resolution Enhancements for Forensic Text Recognition

    Get PDF
    [EN] Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of illegal activities. In this work, we evaluate eight text recognizers and, to increase the performance of text transcription, we combine these recognizers with rectification networks and super-resolution algorithms. We test our approach on four state-of-the-art and two custom datasets (TOICO-1K and Child Sexual Abuse (CSA)-text, based on text retrieved from Tor Darknet and Child Sexual Exploitation Material, respectively). We obtained a 0.3170 score of correctly recognized words in the TOICO-1K dataset when we combined Deep Convolutional Neural Networks (CNN) and rectification-based recognizers. For the CSA-text dataset, applying resolution enhancements achieved a final score of 0.6960. The highest performance increase was achieved on the ICDAR 2015 dataset, with an improvement of 4.83% when combining the MORAN recognizer and the Residual Dense resolution approach. We conclude that rectification outperforms super-resolution when applied separately, while their combination achieves the best average improvements in the chosen datasets.SIInstituto Nacional de CiberseguridadThis research has been funded with support from the European Commission under the 4NSEEK project with Grant Agreement 821966. This publication reflects the views only of the author, and the European Commission cannot be held responsible for any use that may be made of the information contained therein.This research has been supported by the grant ’Ayudas para la realización de estudios de doctorado en el marco del programa propio de investigación de la Universidad de León Convocatoria 2018’ and by the framework agreement between Universidad de León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01. We acknowledge NVIDIA Corporation with the donation of the Titan Xp GPU used for this research

    Efficient Scene Text Detection with Textual Attention Tower

    Full text link
    Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multioriented text in scene images. The proposed feature fusion mechanism allows us to use a shallower network to reduce the computational complexity. A self-attention mechanism is adopted to suppress false positive detections. Experiments on public benchmarks including ICDAR 2013, ICDAR 2015 and MSRA-TD500 show that our proposed approach can achieve better or comparable performances with fewer parameters and less computational cost.Comment: Accepted by ICASSP 202

    A Novel Deep Convolutional Neural Network Architecture Based on Transfer Learning for Handwritten Urdu Character Recognition

    Get PDF
    Deep convolutional neural networks (CNN) have made a huge impact on computer vision and set the state-of-the-art in providing extremely definite classification results. For character recognition, where the training images are usually inadequate, mostly transfer learning of pre-trained CNN is often utilized. In this paper, we propose a novel deep convolutional neural network for handwritten Urdu character recognition by transfer learning three pre-trained CNN models. We fine-tuned the layers of these pre-trained CNNs so as to extract features considering both global and local details of the Urdu character structure. The extracted features from the three CNN models are concatenated to train with two fully connected layers for classification. The experiment is conducted on UNHD, EMILLE, DBAHCL, and CDB/Farsi dataset, and we achieve 97.18% average recognition accuracy which outperforms the individual CNNs and numerous conventional classification methods
    corecore