763 research outputs found
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
We present CloudScan; an invoice analysis system that requires zero
configuration or upfront annotation. In contrast to previous work, CloudScan
does not rely on templates of invoice layout, instead it learns a single global
model of invoices that naturally generalizes to unseen invoice layouts. The
model is trained using data automatically extracted from end-user provided
feedback. This automatic training data extraction removes the requirement for
users to annotate the data precisely. We describe a recurrent neural network
model that can capture long range context and compare it to a baseline logistic
regression model corresponding to the current CloudScan production system. We
train and evaluate the system on 8 important fields using a dataset of 326,471
invoices. The recurrent neural network and baseline model achieve 0.891 and
0.887 average F1 scores respectively on seen invoice layouts. For the harder
task of unseen invoice layouts, the recurrent neural network model outperforms
the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201
Unified Enterprise Knowledge Representation with Conceptual Models - Capturing Corporate Language in Naming Conventions
Conceptual modeling is an established instrument in the knowledge engineering process. However, a precondition for the usability of conceptual models is not only their syntactic correctness but also their semantic comparability. Assuring comparability is quite challenging especially when models are developed by different persons. Empirical studies show that such models can vary heavily, especially in model element naming, even if they are meant to express the same issue. In contrast to most ontology-driven approaches proposing the resolution of these differences ex-post, we introduce an approach that avoids naming differences in conceptual models already during modeling. Therefore we formalize naming conventions combining domain thesauri and phrase structures based on a linguistic grammar. This allows for guiding modelers automatically during the modeling process using standardized labels for model elements, thus assuring unified enterprise knowledge representation. Our approach is generic, making it applicable for any modeling language
Recommended from our members
Toward the automation of business process ontology generation
Semantic Business Process Management (SBPM) utilises semantic technologies (e.g., ontology) to model and query process representations. There are times in which such models must be reconstructed from existing textual documentation. In this scenario the automated generation of ontological models would be preferable, however current methods and technology are still not capable of automatically generating accurate semantic process models from textual descriptions. This research attempts to automate the process as much as possible by proposing a method that drives the transformation through the joint use of a foundational ontology and lexico-semantic analysis. The method is presented, demonstrated and evaluated. The original dataset represents 150 business activities related to the procurement processes of a case study company. As the evaluation shows, the proposed method can accurately map the linguistic patterns of the process descriptions to semantic patterns of the foundational ontology to a high level of accuracy, however further research is required in order to reduce the level of human intervention, expand the method so as to recognise further patterns of the foundational ontology and develop a tool to assist the business process modeller in the semi-automated generation of process models
Using XML views to improve data-independence of distributed applications that share data
The development and maintenance of distributed software applications that support and make efficient use of heterogeneous networked systems is very challenging. One aspect of the complexity is that these distributed applications often need to access shared data, and different applications sharing the data may have different needs and may access different parts of the data. Maintenance and modification are especially difficult when the underlying structure of the data is changed for new requirements. The eXtensible Markup Language, or XML, has emerged as the universal standard for exchanging and externalizing data. It is also widely used for information modeling in an environment consisting of heterogeneous information sources. CORBA is a distributed object technology allowing applications on heterogeneous platforms to communicate through commonly defined services providing a scalable infrastructure for today\u27s distributed systems. To improve data independence, we propose an approach based on XML standards and the notion of views to develop and modify distributed applications which access shared data. In our approach, we model the shared data using XML, and generate different XML views of the data for different applications according to the DTDs of the XML views and the application logic. When the underlying data structure changes, new views are generated systematically. We adopt CORBA as the distributed architecture in our approach. Our thesis is that: views to support data-independence of distributed computing applications can be generated systematically from application logic, CORBA IDL and XML DTD.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .L86. Source: Masters Abstracts International, Volume: 41-04, page: 1113. Adviser: Richard Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2002
Automated Functional Testing based on the Navigation of Web Applications
Web applications are becoming more and more complex. Testing such
applications is an intricate hard and time-consuming activity. Therefore,
testing is often poorly performed or skipped by practitioners. Test automation
can help to avoid this situation. Hence, this paper presents a novel approach
to perform automated software testing for web applications based on its
navigation. On the one hand, web navigation is the process of traversing a web
application using a browser. On the other hand, functional requirements are
actions that an application must do. Therefore, the evaluation of the correct
navigation of web applications results in the assessment of the specified
functional requirements. The proposed method to perform the automation is done
in four levels: test case generation, test data derivation, test case
execution, and test case reporting. This method is driven by three kinds of
inputs: i) UML models; ii) Selenium scripts; iii) XML files. We have
implemented our approach in an open-source testing framework named Automatic
Testing Platform. The validation of this work has been carried out by means of
a case study, in which the target is a real invoice management system developed
using a model-driven approach.Comment: In Proceedings WWV 2011, arXiv:1108.208
AUTOMATING REVIEW OF FORMS FOR INTERNATIONAL TRADE TRANSACTIONS: A NATURAL LANGUAGE PROCESSING APPROACH
A major challenge in Office Automation is one of automating routine jobs that involve
large-scale processing of ill-formed natural language data. Such data are often present in
documents such as forms where it is necessary and/or practical to allow latitude in how the
forms may be filled. In this paper, we describe a computational model designed to process free-form
textual data in application forms for Letters of Credit (LC), which represent a common
vehicle for initiating international trade transactions. The model is based on a variation of the
case-frame or thematic-role frame instantiation methods. We describe the implementation of
the model, report empirical results with real LC applications, and indicate directions we are
currently pursuing to improve its performance.Information Systems Working Papers Serie
Extraction of Process Models from Business Process Descriptions
The purpose of my work is to design a method to transform a textual process description (in English) into a business process model. This is of practical relevance, since process models are often designed by business analysts starting from textual documentation. The method to be designed aims at automating the text-to-diagram conversion phase as much as possible.
Natural languages are known to be highly complex and ambiguous. Accordingly, for this project we will approach the problem using a best-effort approach, meaning that the method is not intended to work always. Instead, the proposed approach will be able to detect certain sentence structures and extract actors, actions and objects/artifacts from them. Coordinating and subordinating conjunctions, as well as punctuation and other markers, will be used to identify sequencing, parallelism, conditional branching and repetition. The output of the method will be a block-structured process model.
The method is being implemented in Java based on open-source Natural-Language Processing (NLP) libraries. Specifically, Part-of-Speech (POS) tagging is performed using the Stanford parser and according to the POS tags, corresponding process entities are identified using Tregex and Tsurgeon. The current implementation is already able to identify actors, actions/tasks and artifacts from sentences that abide to certain common structures. Additionally the implementation is able to correctly interpret passive voice construction, avoid articles, parenthesis and other complex structures for the purpose of extracting essential information about the process
Service Discovery from Uniform Resource Locators of Monitored Web Applications
Käesolev magistritöö käsitleb veebiserveri sissetulevate HTTP päringute URL-ide analüüsimist veebiteenuste tuvastamise eesmärgiga.\n\rProbleem on aktuaalne veebirakenduste monitooringutööriistade seisukohalt, kuna sissetulevad HTTP päringud on vaja omavahel loogiliselt grupeerida selleks, et edasihinnata ning jälgida teenuse vasteaega.\n\rUurimistöö keskendub Java veebirakendustele ning analüüsib URLide andmehulka, mis on saadudud monitoorimistarkvarast Plumbr.\n\rKui monitooritav veebirakendus on realiseeritud mõne Plumbri jaoks tuntud veebiraamistiku abil (näiteks Spring), siis on võimalik seda rakendust instrumenteerida selliselt, et teenuse nimi on üheselt määratletav. Kui aga tegemist on Plumbri jaoks tundmatu veebiraamistikuga, siis ainuke sissetuleva päringu kirjeldus on selle päringu URL.\n\rKui URLis sisalduvad dünaamilised parameetrid, siis sama teenust kasutavad päringud on erinevad ja neid ei ole võimalik ainult URLi põhiselt grupeerida.\n\rKäesolev magistritöö pakub välja URLi analüüsil baseeruva grupeerimise lahenduse. Lahendus tükeldab URLi, eraldab sealt sõnede ahelad, mida seejärel analüüsib kasutades loomuliku keele töötlemise ning graafitransformeerimise tehnikaid.\n\rPakutav teenuste tuvastuse lahendus on teostatud kasutades Java ja Groovy programmeerimiskeeli, hinnatud andmehulgal mis koosneb üle 400 000 URList ning on integreeritud Plumbr monitooringutarkvarasse.This thesis addresses the problem of analyzing Uniform Resource Locators (URLs) of incoming Hypertext Transfer Protocol (HTTP) requests in a Web application server in order to discover the services provided by the applications hosted by the application server, and to group these applications according to the services they provide. The thesis investigates this problem in the context of the Plumbr Java performance monitoring tool. When the hosted applications are implemented using a known web framework (e.g. Spring), the service name and associated data, such as URL parameters, can be extracted directly from the controller. However, this controller-based service discovery approach, which is currently implemented in Plumbr, is not applicable when the hosted applications use unknown framework. This research addresses the problem in this latter more general setting.\n\rThe thesis proposes a pure URL-based approach, where the observed URLs are parsed, leading to sequences of tokens, which are then analyzed using natural language processing techniques and graph transformations. The proposed service discovery technique has been implemented in Groovy and Java, integrated into the Plumbr tool and evaluated on data extracted from production server covering over 400K URLs
- …