21,061 research outputs found

    CloudScan - A configuration-free invoice analysis system using recurrent neural networks

    Get PDF
    We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201

    An Automatic Intelligent System for Document Processing and Fruition

    Get PDF
    With the increasing amount of documents available on-line, the need for intelligent digital libraries, that allow to automatize the document processing tasks and to suitably organize and make available the documents so as to provide personalized and focused access, becomes more and more pressing. This paper proposes an integrated system that merges intelligent modules covering all the phases involved in a document lifecycle, from acquisition, to processing, to information extraction, to personalized fruition for final users. The role and possible cooperation of Machine Learning and Data Mining techniques in the system is highlighted and discussed, along with their importance to provide effective support to both the building and the fruition of the Digital Library and the underlying knowledge base

    Automated user modeling for personalized digital libraries

    Get PDF
    Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information

    Symbolic Computing with Incremental Mindmaps to Manage and Mine Data Streams - Some Applications

    Full text link
    In our understanding, a mind-map is an adaptive engine that basically works incrementally on the fundament of existing transactional streams. Generally, mind-maps consist of symbolic cells that are connected with each other and that become either stronger or weaker depending on the transactional stream. Based on the underlying biologic principle, these symbolic cells and their connections as well may adaptively survive or die, forming different cell agglomerates of arbitrary size. In this work, we intend to prove mind-maps' eligibility following diverse application scenarios, for example being an underlying management system to represent normal and abnormal traffic behaviour in computer networks, supporting the detection of the user behaviour within search engines, or being a hidden communication layer for natural language interaction.Comment: 4 pages; 4 figure
    • …
    corecore