21,061 research outputs found
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
We present CloudScan; an invoice analysis system that requires zero
configuration or upfront annotation. In contrast to previous work, CloudScan
does not rely on templates of invoice layout, instead it learns a single global
model of invoices that naturally generalizes to unseen invoice layouts. The
model is trained using data automatically extracted from end-user provided
feedback. This automatic training data extraction removes the requirement for
users to annotate the data precisely. We describe a recurrent neural network
model that can capture long range context and compare it to a baseline logistic
regression model corresponding to the current CloudScan production system. We
train and evaluate the system on 8 important fields using a dataset of 326,471
invoices. The recurrent neural network and baseline model achieve 0.891 and
0.887 average F1 scores respectively on seen invoice layouts. For the harder
task of unseen invoice layouts, the recurrent neural network model outperforms
the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201
An Automatic Intelligent System for Document Processing and Fruition
With the increasing amount of documents available on-line, the need for intelligent
digital libraries, that allow to automatize the document processing tasks and to suitably
organize and make available the documents so as to provide personalized and focused access,
becomes more and more pressing. This paper proposes an integrated system that merges
intelligent modules covering all the phases involved in a document lifecycle, from acquisition,
to processing, to information extraction, to personalized fruition for final users. The role and
possible cooperation of Machine Learning and Data Mining techniques in the system is
highlighted and discussed, along with their importance to provide effective support to both the
building and the fruition of the Digital Library and the underlying knowledge base
Learning Behavioural Context
The original publication is available at www.springerlink.co
Automated user modeling for personalized digital libraries
Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to
improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in
an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
Symbolic Computing with Incremental Mindmaps to Manage and Mine Data Streams - Some Applications
In our understanding, a mind-map is an adaptive engine that basically works
incrementally on the fundament of existing transactional streams. Generally,
mind-maps consist of symbolic cells that are connected with each other and that
become either stronger or weaker depending on the transactional stream. Based
on the underlying biologic principle, these symbolic cells and their
connections as well may adaptively survive or die, forming different cell
agglomerates of arbitrary size. In this work, we intend to prove mind-maps'
eligibility following diverse application scenarios, for example being an
underlying management system to represent normal and abnormal traffic behaviour
in computer networks, supporting the detection of the user behaviour within
search engines, or being a hidden communication layer for natural language
interaction.Comment: 4 pages; 4 figure
- …