763 research outputs found

    CloudScan - A configuration-free invoice analysis system using recurrent neural networks

    Get PDF
    We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201

    Unified Enterprise Knowledge Representation with Conceptual Models - Capturing Corporate Language in Naming Conventions

    Get PDF
    Conceptual modeling is an established instrument in the knowledge engineering process. However, a precondition for the usability of conceptual models is not only their syntactic correctness but also their semantic comparability. Assuring comparability is quite challenging especially when models are developed by different persons. Empirical studies show that such models can vary heavily, especially in model element naming, even if they are meant to express the same issue. In contrast to most ontology-driven approaches proposing the resolution of these differences ex-post, we introduce an approach that avoids naming differences in conceptual models already during modeling. Therefore we formalize naming conventions combining domain thesauri and phrase structures based on a linguistic grammar. This allows for guiding modelers automatically during the modeling process using standardized labels for model elements, thus assuring unified enterprise knowledge representation. Our approach is generic, making it applicable for any modeling language

    Using XML views to improve data-independence of distributed applications that share data

    Get PDF
    The development and maintenance of distributed software applications that support and make efficient use of heterogeneous networked systems is very challenging. One aspect of the complexity is that these distributed applications often need to access shared data, and different applications sharing the data may have different needs and may access different parts of the data. Maintenance and modification are especially difficult when the underlying structure of the data is changed for new requirements. The eXtensible Markup Language, or XML, has emerged as the universal standard for exchanging and externalizing data. It is also widely used for information modeling in an environment consisting of heterogeneous information sources. CORBA is a distributed object technology allowing applications on heterogeneous platforms to communicate through commonly defined services providing a scalable infrastructure for today\u27s distributed systems. To improve data independence, we propose an approach based on XML standards and the notion of views to develop and modify distributed applications which access shared data. In our approach, we model the shared data using XML, and generate different XML views of the data for different applications according to the DTDs of the XML views and the application logic. When the underlying data structure changes, new views are generated systematically. We adopt CORBA as the distributed architecture in our approach. Our thesis is that: views to support data-independence of distributed computing applications can be generated systematically from application logic, CORBA IDL and XML DTD.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .L86. Source: Masters Abstracts International, Volume: 41-04, page: 1113. Adviser: Richard Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2002

    Automated Functional Testing based on the Navigation of Web Applications

    Full text link
    Web applications are becoming more and more complex. Testing such applications is an intricate hard and time-consuming activity. Therefore, testing is often poorly performed or skipped by practitioners. Test automation can help to avoid this situation. Hence, this paper presents a novel approach to perform automated software testing for web applications based on its navigation. On the one hand, web navigation is the process of traversing a web application using a browser. On the other hand, functional requirements are actions that an application must do. Therefore, the evaluation of the correct navigation of web applications results in the assessment of the specified functional requirements. The proposed method to perform the automation is done in four levels: test case generation, test data derivation, test case execution, and test case reporting. This method is driven by three kinds of inputs: i) UML models; ii) Selenium scripts; iii) XML files. We have implemented our approach in an open-source testing framework named Automatic Testing Platform. The validation of this work has been carried out by means of a case study, in which the target is a real invoice management system developed using a model-driven approach.Comment: In Proceedings WWV 2011, arXiv:1108.208

    AUTOMATING REVIEW OF FORMS FOR INTERNATIONAL TRADE TRANSACTIONS: A NATURAL LANGUAGE PROCESSING APPROACH

    Get PDF
    A major challenge in Office Automation is one of automating routine jobs that involve large-scale processing of ill-formed natural language data. Such data are often present in documents such as forms where it is necessary and/or practical to allow latitude in how the forms may be filled. In this paper, we describe a computational model designed to process free-form textual data in application forms for Letters of Credit (LC), which represent a common vehicle for initiating international trade transactions. The model is based on a variation of the case-frame or thematic-role frame instantiation methods. We describe the implementation of the model, report empirical results with real LC applications, and indicate directions we are currently pursuing to improve its performance.Information Systems Working Papers Serie

    Extraction of Process Models from Business Process Descriptions

    Get PDF
    The purpose of my work is to design a method to transform a textual process description (in English) into a business process model. This is of practical relevance, since process models are often designed by business analysts starting from textual documentation. The method to be designed aims at automating the text-to-diagram conversion phase as much as possible. Natural languages are known to be highly complex and ambiguous. Accordingly, for this project we will approach the problem using a best-effort approach, meaning that the method is not intended to work always. Instead, the proposed approach will be able to detect certain sentence structures and extract actors, actions and objects/artifacts from them. Coordinating and subordinating conjunctions, as well as punctuation and other markers, will be used to identify sequencing, parallelism, conditional branching and repetition. The output of the method will be a block-structured process model. The method is being implemented in Java based on open-source Natural-Language Processing (NLP) libraries. Specifically, Part-of-Speech (POS) tagging is performed using the Stanford parser and according to the POS tags, corresponding process entities are identified using Tregex and Tsurgeon. The current implementation is already able to identify actors, actions/tasks and artifacts from sentences that abide to certain common structures. Additionally the implementation is able to correctly interpret passive voice construction, avoid articles, parenthesis and other complex structures for the purpose of extracting essential information about the process

    Service Discovery from Uniform Resource Locators of Monitored Web Applications

    Get PDF
    Käesolev magistritöö käsitleb veebiserveri sissetulevate HTTP päringute URL-ide analüüsimist veebiteenuste tuvastamise eesmärgiga.\n\rProbleem on aktuaalne veebirakenduste monitooringutööriistade seisukohalt, kuna sissetulevad HTTP päringud on vaja omavahel loogiliselt grupeerida selleks, et edasihinnata ning jälgida teenuse vasteaega.\n\rUurimistöö keskendub Java veebirakendustele ning analüüsib URLide andmehulka, mis on saadudud monitoorimistarkvarast Plumbr.\n\rKui monitooritav veebirakendus on realiseeritud mõne Plumbri jaoks tuntud veebiraamistiku abil (näiteks Spring), siis on võimalik seda rakendust instrumenteerida selliselt, et teenuse nimi on üheselt määratletav. Kui aga tegemist on Plumbri jaoks tundmatu veebiraamistikuga, siis ainuke sissetuleva päringu kirjeldus on selle päringu URL.\n\rKui URLis sisalduvad dünaamilised parameetrid, siis sama teenust kasutavad päringud on erinevad ja neid ei ole võimalik ainult URLi põhiselt grupeerida.\n\rKäesolev magistritöö pakub välja URLi analüüsil baseeruva grupeerimise lahenduse. Lahendus tükeldab URLi, eraldab sealt sõnede ahelad, mida seejärel analüüsib kasutades loomuliku keele töötlemise ning graafitransformeerimise tehnikaid.\n\rPakutav teenuste tuvastuse lahendus on teostatud kasutades Java ja Groovy programmeerimiskeeli, hinnatud andmehulgal mis koosneb üle 400 000 URList ning on integreeritud Plumbr monitooringutarkvarasse.This thesis addresses the problem of analyzing Uniform Resource Locators (URLs) of incoming Hypertext Transfer Protocol (HTTP) requests in a Web application server in order to discover the services provided by the applications hosted by the application server, and to group these applications according to the services they provide. The thesis investigates this problem in the context of the Plumbr Java performance monitoring tool. When the hosted applications are implemented using a known web framework (e.g. Spring), the service name and associated data, such as URL parameters, can be extracted directly from the controller. However, this controller-based service discovery approach, which is currently implemented in Plumbr, is not applicable when the hosted applications use unknown framework. This research addresses the problem in this latter more general setting.\n\rThe thesis proposes a pure URL-based approach, where the observed URLs are parsed, leading to sequences of tokens, which are then analyzed using natural language processing techniques and graph transformations. The proposed service discovery technique has been implemented in Groovy and Java, integrated into the Plumbr tool and evaluated on data extracted from production server covering over 400K URLs
    corecore