20 research outputs found

    A Study of Techniques and Challenges in Text Recognition Systems

    Get PDF
    The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure

    Segmentation of touching characters in upper zone in printed Gurmukhi script

    Full text link
    A new technique for segmenting touching characters in upper zone of printed Gurmukhi script has been presented in this paper. The technique is based on the structural properties of the Gurmukhi script characters. Concavity and convexity of the characters has been studied and using top profile projections, the touching characters in upper zone have been segmented. Recognition rate of 91 % has been achieved for segmenting the touching characters in upper zone

    A Study of Different Kinds of Degradation in Printed Gurmukhi Script

    Full text link

    Deep Learning Based Real Time Devanagari Character Recognition

    Get PDF
    The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements

    Sikh Patronage of Hindustani Music and Śabad Kīrtan in Colonial Punjab, 1857-1947

    Get PDF
    Despite cohabiting overlapping social spheres, north India’s music traditions are too often studied in isolation from one another, negating their inherent interrelatedness. Adopting a more inclusive approach with regard to two major traditions of north India, in this study I explore how both Hindustani music and śabad kīrtan, the sacred music of the Sikhs, enjoyed patronage under the prolific network of Sikh patrons that comprised an important aspect of colonial Punjab’s sociocultural landscape. The distinct influence of aspects of Punjabi society and culture, the unique circumstances surrounding the rise of Sikh patronage, combined with the prominent place of rāg music in Sikh religious tradition, gave rise to an unparalleled environment of music patronage that challenges many modern assumptions about the nature of Hindustani music and its social context during the colonial period. Attending to the Sikh courtly sphere, my study highlights how the developments of Hindustani music in colonial Punjab relate to the broader geopolitics surrounding the 1857 rebellion, harbouring critical insights in relation to the emergence of modern Punjabiyat. Exploring the circulation of Gurmukhi manuscripts on musicology in the Sikh religious sphere up until the late nineteenth century, I highlight a localised tradition of Hindustani musicology, its multivalent character, and links to local music practice. In response to the radical political and discursive shifts wrought by colonialism, I show how in the early twentieth century, through the novel medium of print, the musicological literary output of the Sikhs was co-opted under the new label of gurmat saṅgīt, functioning as a form of symbolic capital in process of Sikh identity formation. Finally, drawing on ethnographic as well as archival research on both sides of the Indo-Pak border, I highlight the multidimensional role of the rabābīs within Sikh religious tradition historically, thus challenging modern musicology-centric understandings of the śabad kīrtan tradition in the process. Attempting to transcend postcolonial discourse and boundaries, this thesis offers a lens through which we might better understand the significant intersection between music traditions in a region like Punjab whilst also offering an alternative perspective on prevailing conceptions of Punjabiyat

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages

    Get PDF
    The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as text-based search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual creation of transcriptions is a costly and time-consuming process. Thus, automatic methods could serve as an alternative approach to creating transcriptions of the text in the notebooks. In order to evaluate the use of automatic methods, a corpus of Bushman texts and their associated transcriptions was created. The creation of this corpus involved: the development of a custom method for encoding the Bushman script, which contains complex diacritics; the creation of a tool for creating and transcribing the texts in the notebooks; and the running of a series of workshops in which the tool was used to create the corpus. The corpus was used to evaluate the use of various techniques for automatically transcribing the texts in the corpus in order to determine which approaches were best suited to the complex Bushman script. These techniques included the use of Support Vector Machines, Artificial Neural Networks and Hidden Markov Models as machine learning algorithms, which were coupled with different descriptive features. The effect of the texts used for training the machine learning algorithms was also investigated as well as the use of a statistical language model. It was found that, for Bushman word recognition, the use of a Support Vector Machine with Histograms of Oriented Gradient features resulted in the best performance and, for Bushman text line recognition, Marti & Bunke features resulted in the best performance when used with Hidden Markov Models. The automatic transcription of the Bushman texts proved to be difficult and the performance of the different recognition systems was largely affected by the complexities of the Bushman script. It was also found that, besides having an influence on determining which techniques may be the most appropriate for automatic handwriting recognition, the texts used in a automatic handwriting recognition system also play a large role in determining whether or not automatic recognition should be attempted at all
    corecore