2,820 research outputs found

    A case-based reasoning approach for invoice structure extraction

    Get PDF
    ISBN : 978-0-7695-2822-9International audienceThis paper shows the use of case-based reasoning (CBR) for invoice structure extraction and analysis. This method, called CBR-DIA (CBR for Document Invoice Analysis), is adaptive and does not need any previous training. It analyses a document by retrieving and analysing similar documents or elements of documents (cases) stored in a database. The retrieval step is performed thanks to graph comparison techniques like graph probing and edit distance. The analysis step is done thanks to the information found in the nearest retrieved cases. Applied on 950 invoices, CBR-DIA reaches a recognition rate of 85.29% for documents of known classes and 76.33% for documents of unknown classes

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Knowledge-based systems for knowledge management in enterprises : Workshop held at the 21st Annual German Conference on AI (KI-97)

    Get PDF

    Automatic Data Interpretation in Accounting Information Systems Based On Ontology

    Get PDF
    Financial transactions recorded into accounting journals based on the evidence of the transaction. There are several kinds of evidence of transactions, such as invoices, receipts, notes, memos and others.  Invoice as one of transaction receipt has many forms that it contains a variety of information.  The information contained in the invoice identified based on rules.  Identifiable information includes: invoice date, supplier name, invoice number, product ID, product name, quantity of product and total price.  In this paper, we proposed accounting ontology and Indonesian accounting dictionary. It can be used in intelligence accounting systems. Accounting ontology provides an overview of account mapping within an organization. The accounting dictionary helps in determining the account names used in accounting journals.  Accounting journal created automatically based on accounting evidence identification.  We have done a simulation of the 160 Indonesian accounting evidences, with the result of precision 86.67%, recall 92.86% and f-measure 89.67%

    A Semantic Model for Enhancing Data-Driven Open Banking Services

    Get PDF
    In current Open Banking services, the European Payment Services Directive (PSD2) allows the secure collection of bank customer information, on their behalf and with their consent, to analyze their financial status and needs. The PSD2 directive has lead to a massive number of daily transactions between Fintech entities which require the automatic management of the data involved, generally coming from multiple and heterogeneous sources and formats. In this context, one of the main challenges lies in defining and implementing common data integration schemes to easily merge them into knowledge-base repositories, hence allowing data reconciliation and sophisticated analysis. In this sense, Semantic Web technologies constitute a suitable framework for the semantic integration of data that makes linking with external sources possible and enhances systematic querying. With this motivation, an ontology approach is proposed in this work to operate as a semantic data mediator in real-world open banking operations. According to semantic reconciliation mechanisms, the underpinning knowledge graph is populated with data involved in PSD2 open banking transactions, which are aligned with information from invoices. A series of semantic rules is defined in this work to show how the financial solvency classification of client entities and transaction concept suggestions can be inferred from the proposed semantic model.This research has been partially funded by the Spanish Ministry of Science and Innovation via the Aether Project with grant number PID2020-112540RB-C41 (AEI/FEDER, UE), the Ministry of Industry, Commerce and Tourism via the Helix initiative with grant number AEI-010500-2020-34, and the Andalusian PAIDI program with grant number P18-RT-2799. Partial funding for open access charge: Universidad de Málag

    Discovering mapping between artifact-centric business process models and execution logs

    Get PDF
    Klassikaliselt on kirjeldatud töövoogusi protsessidele orienteeritud kujul, kus keskendutakse tervele töövoole ja tegevustele selles. Hiljuti on esile kerkinud uudne, artefakti keskne modelleerimine, kus on oluliseks just äriobjektid ning nende vahelised seosed. Artefakti põhised meetodid nõuavad ka muudatusi protsessianalüüsi tehnikates. Üks võimalik protsesside analüüsi meetod on käivituslogide vastavuse kontrollimine protsessi mudeliga, mille abil saab tuvastada kas süsteem käitub nii nagu planeeritud. Mudeli ja logide vastavuse kontrollimiseks on vaja teada, millised sündmused logides vastavad millistele tegevustele mudelis. Töö eemärgiks on automaatselt tuvastada seosed artefakti põhiste protsessimudelites olevate tegevuste ja töövoosüsteemi logides olevate sündmuste vahel. Selline seose tuvastamine pole triviaalne, kuna võib esineda, et sündmuste nimed logides ja tegevuste nimed mudelis ei ole vastavuses. Näiteks ei jälgita samasid standardeid nimetamisel. Samuti on vaja seoste automaatne tuletamine, kui on teada, et logide ja mudeli vahel on mittesobivused ning kõiki sündmuseid ja tegevusi ei saagi vastavusse viia. Automaatne tuvastamine aitab lihtsustada kasutaja tööd. Lahenduseks pakutud meetod kasutab sisendina Procleti põhist mudelit ja käivituslogi süsteemist. Et leida seos mudeli ja logide vahel, viiakse mõlemad graafi kujule. Seosed leitakse iga artefakti kohta eraldi ning ei kasutata infot nende omavahelise suhtluse kohta. Iga artefakti kohta eraldatakse nende Petri võrk ning koostatakse käitumisrelatsioonid, mis väljendavad kuidas on tegevused antud artefaktis omavahel seotud. Sellest koostatakse graaf, mille tippudeks saavad tegevused ning kaarteks tippude vahel käitumisseosed nende vahel. Analoogselt koostatakse graaf iga logis esinenud olemi kohta. Kasutaja poolt sisestatud olemite ja artefaktide tüüpide vahelise seoste abil leitakse iga vastava olemi ja artefakti isendi tegevuste ja sündmuste vahelised seosed. Seoste leidmine taandub kahe graafi vaheliste tippude kujutuse leidmisele. Seoste leidmiseks esmalt arvutatakse sarnasused tegevuste nimede vahel ning selle põhjal leitakse kujutus, mis minimiseeriks teisenduskaugust graafide vahel antud kujutuse põhjal. Kujutuse leimiseks kasutatakse ahnet algoritmi. Praktilise eksperimendina testiti meetodit erinevate mudelite ja logide kombinatsioonidel. Tulemused näitavad, et meetod on võimeline seoseid leidma, kuid tulemuste kvaliteet sõltub palju tegevuste ja sündmuste nimede sarnasusest ja vähem struktuurilisest sarnasustest
    corecore