1,441 research outputs found

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Proceedings of the 2nd IUI Workshop on Interacting with Smart Objects

    Get PDF
    These are the Proceedings of the 2nd IUI Workshop on Interacting with Smart Objects. Objects that we use in our everyday life are expanding their restricted interaction capabilities and provide functionalities that go far beyond their original functionality. They feature computing capabilities and are thus able to capture information, process and store it and interact with their environments, turning them into smart objects

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Detecting grammatical errors with treebank-induced, probabilistic parsers

    Get PDF
    Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

    Towards intelligent, adaptive input devices for users with physical disabilities

    Get PDF
    This thesis presents a novel application of user modelling, the domain of interest being the physical abilities of the user of a computer input device. Specifically, it describes a model which identifies aspects of keyboard use with which the user has difficulty. The model is based on data gathered in an empirical study of keyboard and mouse use by people with and without motor disabilities. In this study, many common input errors due to physical inaccuracies in using keyboards and mice were observed. For the majority of these errors, there exist keyboard or mouse configuration facilities intended to reduce or eliminate them. While such facilities are now integrated into the majority of modem operating systems, there is little published data describing their effect on keyboard or mouse usability. This thesis offers evidence that they can be extremely useful, even essential, but that further research and interface development are required. This thesis presents a user model which focuses on four of the most commonly observed keyboard difficulties. The model also makes recommendations for settings for three keyboard configuration facilities, each of which tackle one of these specific difficulties. As a user modelling task, this application presents a number of interesting challenges. Different users will have very different configuration requirements, and the requirements of individual users may also change over long or short periods of time. Some users will have cognitive impairments. Users may have very limited time and energy to devote to computer use. In response, this research has investigated the extent to which it is possible to model users without interrupting the task for which they are using a computer in the first place. This approach is appealing because it does not require users to spend time participating in model instantiation. This focus on inference rather than explicit testing or questioning also allows the model to dynamically track an individual user's changing requirements. This thesis shows that within the context of the keyboard difficulties studied, such an approach is feasible. The implemented model records users' keyboard input unintrusiveiy as they perform their own input tasks. This input is examined for evidence of certain types of input error or indications of difficulties in using the keyboard. In the model presented, conclusions are based on the assumption that the user is typing English text in a word processing application. However, the design of the model allows any other textual language to be used. A second empirical study, evaluating the model, is described. The model is shown to be very accurate in identifying users having difficulties in each of the areas tackled, the only exception being those who find a given operation awkward, but are able to perform it accurately. Where it is also possible to evaluate the configuration recommendations made by the model, the chosen settings are effective in reducing input errors and increasing user satisfaction with the keyboard. The model is also able to draw conclusions quickly for users with higher error rates, and shows good overall stability. In the light of this successful identification of keyboard difficulties, potential applications of the model are suggested. It could be used to help occupational therapists and assistive technologists to assess the keyboard configuration requirements of a new user. It could also be made available to users themselves - many people are currently unaware of facilities they may find useful, and how to activate them. The model could be extended to other areas of keyboard use, and to other input devices. This would allow systems to provide automatic, dynamic support for configuration, which would go some way towards improving the accessibility of computer systems for people with motor disabilities

    Basics of man-machine communication for the design of educational systems : NATO Advanced Study Institute, August 16-26, 1993, Eindhoven, The Netherlands

    Get PDF

    Basics of man-machine communication for the design of educational systems : NATO Advanced Study Institute, August 16-26, 1993, Eindhoven, The Netherlands

    Get PDF

    Writing Development in Struggling Learners

    Get PDF
    In Writing Development in Struggling Learners, international researchers provide insights into the development of writing skills from early writing and spelling development through to composition, the reasons individuals struggle to acquire proficient writing skills and how to help these learners.; Readership: Academic libraries, graduate students; post-graduate researchers; literacy researchers; educated lay persons; literacy specialists; primary/secondary educators

    Modeling second language learners' interlanguage and its variability: a computer-based dynamic assessment approach to distinguishing between errors and mistakes

    Get PDF
    Despite a long history, interlanguage variability research is a debatable topic as most paradigms do not distinguish between competence and performance. While interlanguage performance has been proven to be variable, determining whether interlanguage competence is exposed to random and/or systematic variations is complex, given the fact that distinction between competence-dependent errors and performance-related mistakes should be established to best represent the interlanguage competence. This thesis suggests a dynamic assessment model grounded in sociocultural theory to distinguish between errors and mistakes in texts written by learners of French, to then investigate the extent to which interlanguage competence varies across time, text types, and students. The key outcomes include: 1. An expanded model based on dynamic assessment principles to distinguish between errors and mistakes, which also provides the structure to create and observe learners’ zone of proximal development; 2. A method to increase the accuracy of the part-of-speech tagging procedure whose reliability correlates with the number of incorrect words contained in learners’ texts; 3. A sociocultural insight into interlanguage variability research. Results demonstrate that interlanguage competence is as variable as performance. The main finding shows that knowledge over time is subject to not only systematic, but also unsystematic variations

    Toward effective conversational messaging

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 118-123).Matthew Talin Marx.M.S
    • 

    corecore