30 research outputs found
Advanced document data extraction techniques to improve supply chain performance
In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information
Drawing, Handwriting Processing Analysis: New Advances and Challenges
International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline
Comprehensive Survey: Biometric User Authentication Application, Evaluation, and Discussion
This paper conducts an extensive review of biometric user authentication
literature, addressing three primary research questions: (1) commonly used
biometric traits and their suitability for specific applications, (2)
performance factors such as security, convenience, and robustness, and
potential countermeasures against cyberattacks, and (3) factors affecting
biometric system accuracy and po-tential improvements. Our analysis delves into
physiological and behavioral traits, exploring their pros and cons. We discuss
factors influencing biometric system effectiveness and highlight areas for
enhancement. Our study differs from previous surveys by extensively examining
biometric traits, exploring various application domains, and analyzing measures
to mitigate cyberattacks. This paper aims to inform researchers and
practitioners about the biometric authentication landscape and guide future
advancements
Deep Learning for Scene Text Detection, Recognition, and Understanding
Detecting and recognizing texts in images is a long-standing task in computer vision. The goal of this task is to extract textual information from images and videos, such as recognizing license plates. Despite that the great progresses have been made in recent years, it still remains challenging due to the wide range of variations in text appearance. In this thesis, we aim to review the existing issues that hinder current Optical Character Recognition (OCR) development and explore potential solutions. Specifically, we first investigate the phenomenon of unfair comparisons between different OCR algorithms caused due to the lack of a consistent evaluation framework. Such an absence of a unified evaluation protocol leads to inconsistent and unreliable results, making it difficult to compare and improve upon existing methods. To tackle this issue, we design a new evaluation framework from the aspect of datasets, metrics, and models, enabling consistent and fair comparisons between OCR systems. Another issue existing in the field is the imbalanced distribution of training samples. In particular, the sample distribution largely depended on where and how the data was collected, and the resulting data bias may lead to poor performance and low generalizability on under-represented classes. To address this problem, we took the driving license plate recognition task as an example and proposed a text-to-image model that is able to synthesize photo-realistic text samples. By using this model, we synthesized more than one million samples to augment the training dataset, significantly improving the generalization capability of OCR models. Additionally, this thesis also explores the application of text vision question answering, which is a new and emerging research topic among the OCR community. This task challenges the OCR models to understand the relationships between the text and backgrounds and to answer the given questions. In this thesis, we propose to investigate evidence-based text VQA, which involves designing models that can provide reasonable evidence for their predictions, thus improving the generalization ability.Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 202
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
Pengembangan Bahan Ajar English for Occupational Purposes di Perguruan Tinggi Berbasis Web untuk Menumbuhkan Semangat Kewirausahaan
Penelitian ini bertujuan untuk a) mengetahui kebutuhan belajar bahasa Inggris di
perguruan tinggi yang berorientasi pada pemenuhan kompetensi berbahasa yang dibutuhkan
oleh dunia kerja sesuai dengan kelompok bidang ilmu khususnya English for Business and
Economics, yang berorientasi membangun semangat dan jiwa kewirausahaan, b) menentukan
key characters yang membangun semangat dan jiwa kewirausahaan yang sesuai dengan
kelompok bidang ilmu di atas, serta c) mengembangkan bahan ajar berbasis web di perguruan
tinggi yang dapat diimplementasikan lintas fakultas, yang berorientasi pada kebutuhan
vokasi/profesi atau English for Occupational Purposes, yang merupakan bagian dari
pendekatan English for Specific Purposes guna menumbuhkan semangat kewirausahaan.
Penelitian ini merupakan penelitian R&D (Reseach and Development/ penelitian dan
pengembangan) yang langkah – langkahnya meliputi 2 tahap yang dilaksanakan dalam 2
tahun. Penelitian ini merupakan tahun pertama dari dua tahun yang diusulkan. Langkahlangkah
dalam tahap pertama mencakup analisis kebutuhan, mengembangkan course grid,
mengembangkan draftawal bahan ajar, penilaian draft produk oleh pakar (expert judgment),
dan revisi produk. Analisis kebutuhan dilakukan menggunakan angket yang disebarkan
kepada 100 mahasiswa berbagai jurusan di Fakultas Ekonomi (FE), Universitas Negeri
Yoygakarta, yang mewakili kelompok bidang ilmu English for Business and Economics.
Penelitian ini menghasilkan 2 jenis produk yaitu coursegrid dan bahan ajar yang
berjudul English for Ocupational Purposes (EOP). Coursegrid terdiri dari komponen: judul
unit, indikator, keterampilan berbahasa lisan (spoken cycle), keterampilan berbahasa tulis
(written cycle) dan tips bisnis. Coursegrid ini kemudian dikembangkan menjadi bahan ajar.
Jumlah unit yang dikembangkan adalah 6 unit. Setiap unit menyajikan 4 keterampilan pokok
bahasa Inggris ( major English skills) yang dikategorisasi menjadi keterampilan berbahasa
Inggris lisan (spoken cycle) dan tulisan (written cycle. Keterampilan berbahasa Inggris lisan
mencakup mencakup keterampilan listening dan speaking sedangkan ketempilan tulis
mencakup keterampilan reading dan writing. Fokus pembelajaran pada spoken cycle adalah
language functions, spoken text dan kosakata sedang pada written cycle adalah pada written
text, generic structure serta grammar. Bahan ajar ini juga disertai dengan informasi tentang
tips dan informasi bisnis yang disajikan dalam bentuk teks sangat pendek. Di dalam bahan
ajar ini terdapat beberapa karakter yang diinsersikan. Karakter tersebut adalah: Perseverance
(ketekunan), discipline (kedisiplinan), honest (kejujuran), creative/innovative,
(kreatif/innovative), positive thinking, (berfikiran positif), communicative (komunikatif) dan
open minded (berfikiran terbuka). Karakter ini menjadi salah satu acauan dalam pemilihan
materi dan organisasi kegiatan dalam bahan ajar EOP. Berdasarkan hasil evaluasi, draft
terakhir ini termasuk kriteria sangat baik sehingga layak untuk dipergunakan
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
INTERACT 2015 Adjunct Proceedings. 15th IFIP TC.13 International Conference on Human-Computer Interaction 14-18 September 2015, Bamberg, Germany
INTERACT is among the world’s top conferences in Human-Computer Interaction. Starting with the first INTERACT conference in 1990, this conference series has been organised under the aegis of the Technical Committee 13 on Human-Computer Interaction of the UNESCO International Federation for Information Processing (IFIP). This committee aims at developing the science and technology of the interaction between humans and computing devices.
The 15th IFIP TC.13 International Conference on Human-Computer Interaction - INTERACT 2015 took place from 14 to 18 September 2015 in Bamberg, Germany. The theme of INTERACT 2015 was "Connection.Tradition.Innovation". This volume presents the Adjunct Proceedings - it contains the position papers for the students of the Doctoral Consortium as well as the position papers of the participants of the various workshops