80 research outputs found

    Towards Constructing Corpus of Punjabi N-grams Written in Gurmukhi Script

    Get PDF
    The availability of a robust corpus is crucial for developing linguistic resources. For the Punjabi language, written in the Gurmukhi script, the scarcity of such a resource hinders the validation of various natural language processing (NLP) techniques. This paper addresses this gap by presenting the creation of a comprehensive corpus for Punjabi in Gurmukhi. The corpus, with approximately 23 million words drawn from diverse published materials, serves as a valuable foundation for NLP research. Additionally, the paper describes a dedicated corpus processing tool designed specifically for Punjabi. This tool employs a novel method for constructing word, bigram, and trigram levels of the corpus, applicable for building such resources for any script. As a demonstration, we showcase a generated dataset composed of approximately 15.5 million Punjabi words and 50 million character

    Creation of Digital Libraries in Indian Languages Using Unicode

    Get PDF
    Unicode is 16-bit code for character representation in a computer. Unicode is designed by Unicode consortium. It represents almost all the world script including extinct many of extinct scripts like Bramhi and Kharosthi. ISCII is another code developed for to represent the Indian characters in computer but there are problems with character representation using ISCII. It is found that Unicode can solve the problem. This paper suggests measures for creation of Digital library in Indic languages and the problem associated with Unicode

    Movement of the Organized Blind in India: From Passive Recipients of Services to Active Advocates of Their Rights

    Get PDF
    In recent years, the subject of the newborn disability rights movement in India has been attracting the attention of researchers, but there has been very little effort to document the movement of blind people in India for their rights, which preceded the broader disability rights movement. I therefore conducted a qualitative study of this movement of blind people in India by using the methods of oral history and document analysis. For this purpose, I conducted 93 interviews (by interviewing 45 informants) and analyzed relevant documents. Borrowing terminology from the self-advocacy movement of the blind in the United States, I describe this movement as a movement of the Organized Blind, which was launched when blind activists began to organize themselves at the national level in India during the early 1970s. I have attempted to explain that since the launching of this movement, blind activists have been constantly engaged in a struggle for their rights, which encompasses a wide range of issues from the right to employment to the enactment and implementation of the comprehensive disability rights law. I describe the historical evolution of this movement as a process of transformation of the status of blind people in India from being passive recipients of services offered to them through the service delivery organizations to active advocates of their rights. I have classified the evolution of this movement into four stages from 1970 to 2005. I also reject the existing views about the time of origin of the disability rights movement in India and establish my argument that it began in late 1980s when blind activists began to focus on the demand for the enactment of a comprehensive disability rights law, which resulted in the enactment of such a law in 1995. Finally, I have analyzed the changing methods of advocacy as well as the shift in the approach of the service delivery organizations in the field of blindness in India from outright rejection of the advocacy approach to its acceptance in the post-1995 period. In recent years, the subject of the newborn disability rights movement in India has been attracting the attention of researchers, but there has been very little effort to document the movement of blind people in India for their rights, which preceded the broader disability rights movement. I therefore conducted a qualitative study of this movement of blind people in India by using the methods of oral history and document analysis. For this purpose, I conducted 93 interviews (by interviewing 45 informants) and analyzed relevant documents. Borrowing terminology from the self-advocacy movement of the blind in the United States, I describe this movement as a movement of the Organized Blind, which was launched when blind activists began to organize themselves at the national level in India during the early 1970s. I have attempted to explain that since the launching of this movement, blind activists have been constantly engaged in a struggle for their rights, which encompasses a wide range of issues from the right to employment to the enactment and implementation of the comprehensive disability rights law. I describe the historical evolution of this movement as a process of transformation of the status of blind people in India from being passive recipients of services offered to them through the service delivery organizations to active advocates of their rights. I have classified the evolution of this movement into four stages from 1970 to 2005. I also reject the existing views about the time of origin of the disability rights movement in India and establish my argument that it began in late 1980s when blind activists began to focus on the demand for the enactment of a comprehensive disability rights law, which resulted in the enactment of such a law in 1995. Finally, I have analyzed the changing methods of advocacy as well as the shift in the approach of the service delivery organizations in the field of blindness in India from outright rejection of the advocacy approach to its acceptance in the post-1995 period. In recent years, the subject of the newborn disability rights movement in India has been attracting the attention of researchers, but there has been very little effort to document the movement of blind people in India for their rights, which preceded the broader disability rights movement. I therefore conducted a qualitative study of this movement of blind people in India by using the methods of oral history and document analysis. For this purpose, I conducted 93 interviews (by interviewing 45 informants) and analyzed relevant documents. Borrowing terminology from the self-advocacy movement of the blind in the United States, I describe this movement as a movement of the Organized Blind, which was launched when blind activists began to organize themselves at the national level in India during the early 1970s. I have attempted to explain that since the launching of this movement, blind activists have been constantly engaged in a struggle for their rights, which encompasses a wide range of issues from the right to employment to the enactment and implementation of the comprehensive disability rights law. I describe the historical evolution of this movement as a process of transformation of the status of blind people in India from being passive recipients of services offered to them through the service delivery organizations to active advocates of their rights. I have classified the evolution of this movement into four stages from 1970 to 2005. I also reject the existing views about the time of origin of the disability rights movement in India and establish my argument that it began in late 1980s when blind activists began to focus on the demand for the enactment of a comprehensive disability rights law, which resulted in the enactment of such a law in 1995. Finally, I have analyzed the changing methods of advocacy as well as the shift in the approach of the service delivery organizations in the field of blindness in India from outright rejection of the advocacy approach to its acceptance in the post-1995 period

    An Online Numeral Recognition System Using Improved Structural Features – A Unified Method for Handwritten Arabic and Persian Numerals

    Get PDF
    With the advances in machine learning techniques, handwritten recognition systems also gained importance. Though digit recognition techniques have been established for online handwritten numerals, an optimized technique that is writer independent is still an open area of research. In this paper, we propose an enhanced unified method for the recognition of handwritten Arabic and Persian numerals using improved structural features. A total of 37 structural based features are extracted and Random Forest classifier is used to classify the numerals based on the extracted features. The results of the proposed approach are compared with other classifiers including Support Vector Machine (SVM), Multilayer Perceptron (MLP) and K-Nearest Neighbors (KNN). Four different well-known Arabic and Persian databases are used to validate the proposed method. The obtained average 96.15% accuracy in recognition of handwritten digits shows that the proposed method is more efficient and produces better results as compared to other techniques

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Investing in Ourselves: Giving and Fund Raising in India

    Get PDF
    This is the India case study of Investing in Ourselves - Giving and Fund Raising in Asia, which had its origin in the International Conference on Supporting the Nonprofit Sector in Asia, sponsored by the Asia Pacific Philanthropy Consortium (APPC) in January 1998
    corecore