3 research outputs found

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Satellite Workshop On Language, Artificial Intelligence and Computer Science for Natural Language Processing Applications (LAICS-NLP): Discovery of Meaning from Text

    Get PDF
    This paper proposes a novel method to disambiguate important words from a collection of documents. The hypothesis that underlies this approach is that there is a minimal set of senses that are significant in characterizing a context. We extend Yarowsky’s one sense per discourse [13] further to a collection of related documents rather than a single document. We perform distributed clustering on a set of features representing each of the top ten categories of documents in the Reuters-21578 dataset. Groups of terms that have a similar term distributional pattern across documents were identified. WordNet-based similarity measurement was then computed for terms within each cluster. An aggregation of the associations in WordNet that was employed to ascertain term similarity within clusters has provided a means of identifying clusters’ root senses

    Confidence building measures in South Asia

    Get PDF
    This dissertation evolves a theoretical framework of the concept of Confidence Building Measures and applies it to the case-study of India Pakistan relations in the South Asia region. Part I examines the Confidence Building Measures in a global regional perspective. It outlines a theoretical framework of Confidence Building Measures by putting forward an appropriate definition of this concept and conceptualising the confidence building process in a model. It explores the empirical universe of Confidence Building Measures on a global scale in a conflict and crisis framework, in terms of its functional dimensions and at different levels of analysis. This provides a conceptual and empirical backdrop for an examination of Confidence Building Measures in the South Asian region. Part II studies the trends of conflict and cooperation in India-Pakistan relations in the first two decades after independence in 1947 and sets the stage for a more formal reconciliation process between the two countries in the post-Simla Agreement (1972) period. It also examines the operational variables given in the Indian and Pakistani political milieu that shape their bilateral confidence building process. Part III presents a detailed analysis of the India-Pakistan confidence building process in its political, military, economic and socio-cultural dimensions in the last two decades. The core issues of India-Pakistan conflict, the Kashmir conflict and Pakistan's alleged involvement in supporting terrorism in the Indian states of Punjab and Jammu and Kashmir have also been discussed. Part IV summarizes the major findings and conclusions of the study and puts forward some suggestions which may facilitate the confidence building process between India and Pakistan. The dissertation has relied on information gathered from the field work research carried out in India and Pakistan in Winter 1991-1992
    corecore