Offline Pashto Characters Dataset for OCR Systems

Khan, Habib Ullah; Khan, Sulaiman; Nazir, Shah

Offline Pashto Characters Dataset for OCR Systems

Authors: Habib Ullah Khan
Sulaiman Khan
Shah Nazir
Publication date: 27 July 2021
Publisher: 'Hindawi Limited'
Doi

Abstract

In computer vision and artificial intelligence, text recognition and analysis based on images play a key role in the text retrieving process. Enabling a machine learning technique to recognize handwritten characters of a specific language requires a standard dataset. Acceptable handwritten character datasets are available in many languages including English, Arabic, and many more. However, the lack of datasets for handwritten Pashto characters hinders the application of a suitable machine learning algorithm for recognizing useful insights. In order to address this issue, this study presents the first handwritten Pashto characters image dataset (HPCID) for the scientific research work. This dataset consists of fourteen thousand, seven hundred, and eighty-four samples - 336 samples for each of the 44 characters in the Pashto character dataset. Such samples of handwritten characters are collected on an A4-sized paper from different students of Pashto Department in University of Peshawar, Khyber Pakhtunkhwa, Pakistan. On total, 336 students and faculty members contributed in developing the proposed database accumulation phase. This dataset contains multisize, multifont, and multistyle characters and of varying structures

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Qatar University Institutional Repository

oai:qspace.qu.edu.qa:10576/376...

Last time updated on 08/01/2023