13 research outputs found

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    A segmentation-free approach to recognise printed Sinhala script using linear symmetry

    No full text
    In this paper, a novel approach for printed character recognition using linear symmetry is proposed. When the conventional character recognition methods such as the artificial neural network based techniques are used to recognise Brahmi Sinhala script, segmentation of modified characters into modifier symbols and basic characters is a necessity but a complex issue. The large size of the character set makes the whole recognition process even more complex. In contrast, in the proposed method, the orientation features are effectively used to recognize characters directly using a standard alphabet as the basis without the need for segmentation into basic components. The edge detection algorithm using linear symmetry recognises vertical modifiers. The linear symmetry principle is also used to determine the skew angle. Experiments with the aim for an optical character recognition system for the printed Sinhala script show favourable results

    Beyond ABC : investigating current rationales and systems for the teaching of early reading to young learners of English

    Get PDF
    The premise of this thesis is that the role of the first steps in reading in courses for Young Learners of English (YL) at the beginner stage is a neglected area, with anomalies centred around the fact that ‘words on the page ’are often treated as if they were facilitative from the outset for language work in areas such as speaking while very little support is offered to children as to how to decode these words. Chapter 1 (Introduction) traces the rapid spread of YL teaching worldwide and considers the preparation of teachers for their roles. Materials are discussed as an important source of support and structure for teachers and a case is made for a focus in the main study on systems and rationales for early reading found among teachers themselves or evidenced in published materials. Chapter 2 (Literature Review) discusses relevant issues for systematic support for YL in their first steps in reading English. Areas discussed are: Teacher Cognition, Sociocultural inductions to reading, Orthographic Depth, Phonology, research on reading development across languages and influences in the YL world of established early reading methods for English native-speaking children. Chapter 3 (Research Methodology) justifies the decision to investigate the area via two main studies: (1) questionnaires and in-depth interviews with EYL professionals and (2) close analysis of course materials. It is argued that the qualitative stance of the former is not in conflict with the more objective and quantitative handling of course material data since both are appropriate ways of focusing on the same issue. A third, small-scale, study of the publishing experiences of curriculum experts and materials writers is justified and described. Chapter 4 (Findings) reports and integrates the findings of both main studies and summarizes the findings from the study with curriculum experts and materials writers. Main findings are that EYL professionals tend not to put linguistic considerations high in their priorities for decision-making and that materials analyzed had an underlay in the Alphabetic Principle but were dominated by ‘ABC’ ordering of Reading-Focal items and included activities which tended not to promote pattern-seeking or other behaviour likely to lead to ‘self-teaching’. Chapter 5 (Discussion) discusses the significance of the findings of the two main studies and uses the results of the third study to add balance to the materials analysis study. Limitations of, and reflections on, the research are discussed. Chapter 6 (Conclusions) draws implications for professional education, pedagogy and materials illustrated by examples in the Appendices. Claims are made for the contributions of the study that (1) it opens up discussion on an area of YL teaching which has been neglected both in the research literature and in practical materials creation (2) through the use of in-depth interviews it allows a voice for EYL professionals which has not been heard before (3) the concepts of Reading-Focal versus Vehicular language in YL course materials are claimed as new and useful, leading directly (4) to procedures and analysis tools which can be used with any set of YL materials. Directions for further research building on this thesis are indicated

    Reading Comprehension in L2 Italian: Connecting Psycholinguistic Research and Pedagogical Practice

    Get PDF
    This dissertation explores reading comprehension abilities with a special focus on language minority bilingual children (LMBC). This population is often found to display lower scores than their monolingual peers and this gap in performance can negatively affect their future educational experience. Our goal was to shed light on the origins of these comprehension difficulties. To do so, we carried out an experimental study that involved 109 pupils attending 4th and 5th grade of primary school. The participants were 61 language minority bilingual and 48 monolingual students. We assessed their performance in reading comprehension and in a series of linguistic and non-linguistic abilities that are considered potential predictors of reading comprehension, i.e., general cognitive abilities, decoding skills, receptive vocabulary, and receptive grammar. We conducted an analysis to determine which ones were the best predictors for the two groups. The outcomes highlighted that while monolingual students relied primarily on their vocabulary knowledge during reading comprehension, for LMBC grammar knowledge, speed during decoding, and general cognitive abilities were also influencing their performance. Moreover, using three Self-Paced Reading Tasks (SPRT), we explored on-line language processing to verify whether there were qualitative or quantitative differences between groups. The analysis of reading times revealed that both groups followed similar processing patterns, but monolinguals obtained significantly higher scores in terms of accuracy. These results seem to suggest that processing complex structures in Italian is cognitively more demanding for the LMBC. The last part of the project was dedicated to the implementation of pedagogical practices that focused on teaching grammar and practicing the ability to make inferences using methods that aimed to stimulate the pupils’ metalinguistic awareness instead of using abstract rules.Questa tesi esplora le abilità di comprensione della lettura con una particolare attenzione ai bambini bilingui con background migratorio. Questa popolazione mostra spesso punteggi più bassi rispetto ai loro coetanei monolingui e questa discrepanza nei risultati può influire negativamente sulla loro esperienza scolastica. Il nostro obiettivo è quello di fare luce sulle origini di queste difficoltà di comprensione. Per farlo, abbiamo condotto uno studio sperimentale che ha coinvolto 109 alunni frequentanti il quarto e quinto anno di scuola primaria. I partecipanti includevano 61 studenti bilingui con background migratorio e 48 studenti monolingui. Abbiamo valutato le loro capacità nella comprensione della lettura e in una serie di abilità linguistiche e non linguistiche considerate potenziali predittori della comprensione della lettura (abilità cognitive generali, abilità di decodifica, vocabolario recettivo e grammatica recettiva). Inoltre, abbiamo condotto un'analisi per determinare quali fossero i migliori predittori per i due gruppi. I risultati hanno evidenziato che, mentre gli studenti monolingui si affidavano principalmente alla loro conoscenza lessicale durante la comprensione della lettura, per i bilingui anche la conoscenza grammaticale, la velocità durante la decodifica e le abilità cognitive generali influenzavano i loro punteggi. Con tre Self-Paced Reading Tasks (SPRT), abbiamo esplorato le loro capacità di processing on-line per verificare se ci fossero differenze di tipo qualitativo o quantitativo tra i gruppi. L'analisi dei tempi di lettura ha rivelato che entrambi i gruppi hanno seguito strategie di processing simili, ma i monolingui hanno ottenuto punteggi significativamente più alti in termini di accuratezza. Questi risultati sembrano suggerire che l'elaborazione di strutture complesse in italiano richieda maggiori risorse cognitive per gli studenti bilingui. La parte finale del progetto è stata dedicata all'implementazione di pratiche pedagogiche incentrate sull'insegnamento della grammatica e sulla pratica dell'abilità di fare inferenze utilizzando metodi volti a stimolare la consapevolezza metalinguistica degli alunni invece di utilizzare regole astratte

    Rhyme and Rhyming in Verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme

    Rhyme and Rhyming in verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme.Peer reviewe

    Rhyme and Rhyming in Verbal Art, Language, and Song

    Get PDF
    This collection of thirteen chapters answers new questions about rhyme, with views from folklore, ethnopoetics, the history of literature, literary criticism and music criticism, psychology and linguistics. The book examines rhyme as practiced or as understood in English, Old English and Old Norse, German, Swedish, Norwegian, Finnish and Karelian, Estonian, Medieval Latin, Arabic, and the Central Australian language Kaytetye. Some authors examine written poetry, including modernist poetry, and others focus on various kinds of sung poetry, including rap, which now has a pioneering role in taking rhyme into new traditions. Some authors consider the relation of rhyme to other types of form, notably alliteration. An introductory chapter discusses approaches to rhyme, and ends with a list of languages whose literatures or song traditions are known to have rhyme

    Factors Influencing Customer Satisfaction towards E-shopping in Malaysia

    Get PDF
    Online shopping or e-shopping has changed the world of business and quite a few people have decided to work with these features. What their primary concerns precisely and the responses from the globalisation are the competency of incorporation while doing their businesses. E-shopping has also increased substantially in Malaysia in recent years. The rapid increase in the e-commerce industry in Malaysia has created the demand to emphasize on how to increase customer satisfaction while operating in the e-retailing environment. It is very important that customers are satisfied with the website, or else, they would not return. Therefore, a crucial fact to look into is that companies must ensure that their customers are satisfied with their purchases that are really essential from the ecommerce’s point of view. With is in mind, this study aimed at investigating customer satisfaction towards e-shopping in Malaysia. A total of 400 questionnaires were distributed among students randomly selected from various public and private universities located within Klang valley area. Total 369 questionnaires were returned, out of which 341 questionnaires were found usable for further analysis. Finally, SEM was employed to test the hypotheses. This study found that customer satisfaction towards e-shopping in Malaysia is to a great extent influenced by ease of use, trust, design of the website, online security and e-service quality. Finally, recommendations and future study direction is provided. Keywords: E-shopping, Customer satisfaction, Trust, Online security, E-service quality, Malaysia

    Rhyme and Rhyming in Verbal Art, Language, and Song

    Get PDF
    This interdisciplinary collection explores the forms and aesthetics of rhyme in a variety of languages and from a variety of perspectives. A wide-ranging introduction that ends with a list and associated bibliography of rhyming traditions of the world is followed by thirteen chapters. These explore the history of rhyme, including Arabic and medieval Latin and the older Germanic languages, as well as literary and folk traditions in Northern Europe where rhyme plays a complex role alongside alliteration. Literary rhyme is explored from a psychological perspective, and oral composition with end rhyme is addressed. Discussions of modernist poetry, rap lyrics, and previously undiscussed traditions shed new light on the possibilities of rhyme. The book will be of interest to literary scholars, folklorists, and anyone interested in written, oral, and song traditions. Students, poets, and songwriters will find insights into the functions and aesthetics of rhyme
    corecore