Search CORE

30 research outputs found

Advanced document data extraction techniques to improve supply chain performance

Author: Sharma Vikash
Publication venue
Publication date: 01/07/2021
Field of study

In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

Repository@Hull - Worktribe

Drawing, Handwriting Processing Analysis: New Advances and Challenges

Author: Anquetil Eric
Prevost Lionel
Rémi Céline
Publication venue: HAL CCSD
Publication date: 21/06/2015
Field of study

International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Comprehensive Survey: Biometric User Authentication Application, Evaluation, and Discussion

Author: AlQahtani Ali Abdullah S.
Alrawili Reem
Khan Muhammad Khurram
Publication venue
Publication date: 20/01/2024
Field of study

This paper conducts an extensive review of biometric user authentication literature, addressing three primary research questions: (1) commonly used biometric traits and their suitability for specific applications, (2) performance factors such as security, convenience, and robustness, and potential countermeasures against cyberattacks, and (3) factors affecting biometric system accuracy and po-tential improvements. Our analysis delves into physiological and behavioral traits, exploring their pros and cons. We discuss factors influencing biometric system effectiveness and highlight areas for enhancement. Our study differs from previous surveys by extensively examining biometric traits, exploring various application domains, and analyzing measures to mitigate cyberattacks. This paper aims to inform researchers and practitioners about the biometric authentication landscape and guide future advancements

arXiv.org e-Print Archive

Deep Learning for Scene Text Detection, Recognition, and Understanding

Author: Wang Xinyu
Publication venue
Publication date: 01/01/2023
Field of study

Detecting and recognizing texts in images is a long-standing task in computer vision. The goal of this task is to extract textual information from images and videos, such as recognizing license plates. Despite that the great progresses have been made in recent years, it still remains challenging due to the wide range of variations in text appearance. In this thesis, we aim to review the existing issues that hinder current Optical Character Recognition (OCR) development and explore potential solutions. Specifically, we first investigate the phenomenon of unfair comparisons between different OCR algorithms caused due to the lack of a consistent evaluation framework. Such an absence of a unified evaluation protocol leads to inconsistent and unreliable results, making it difficult to compare and improve upon existing methods. To tackle this issue, we design a new evaluation framework from the aspect of datasets, metrics, and models, enabling consistent and fair comparisons between OCR systems. Another issue existing in the field is the imbalanced distribution of training samples. In particular, the sample distribution largely depended on where and how the data was collected, and the resulting data bias may lead to poor performance and low generalizability on under-represented classes. To address this problem, we took the driving license plate recognition task as an example and proposed a text-to-image model that is able to synthesize photo-realistic text samples. By using this model, we synthesized more than one million samples to augment the training dataset, significantly improving the generalization capability of OCR models. Additionally, this thesis also explores the application of text vision question answering, which is a new and emerging research topic among the OCR community. This task challenges the OCR models to understand the relationships between the text and backgrounds and to answer the given questions. In this thesis, we propose to investigate evidence-based text VQA, which involves designing models that can provide reasonable evidence for their predictions, thus improving the generalization ability.Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 202

Adelaide Research & Scholarship

On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

Author: Antonisse Joey
Azzopardi George
Bennabhaktula Swaroop
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2021
Field of study

Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Pengembangan Bahan Ajar English for Occupational Purposes di Perguruan Tinggi Berbasis Web untuk Menumbuhkan Semangat Kewirausahaan

Author: Dra. Jamilah M.Pd.
Ella Wulandari, M.A.
Lusi Nurhayati, M.Appl.Ling
Publication venue
Publication date: 01/01/2013
Field of study

Penelitian ini bertujuan untuk a) mengetahui kebutuhan belajar bahasa Inggris di perguruan tinggi yang berorientasi pada pemenuhan kompetensi berbahasa yang dibutuhkan oleh dunia kerja sesuai dengan kelompok bidang ilmu khususnya English for Business and Economics, yang berorientasi membangun semangat dan jiwa kewirausahaan, b) menentukan key characters yang membangun semangat dan jiwa kewirausahaan yang sesuai dengan kelompok bidang ilmu di atas, serta c) mengembangkan bahan ajar berbasis web di perguruan tinggi yang dapat diimplementasikan lintas fakultas, yang berorientasi pada kebutuhan vokasi/profesi atau English for Occupational Purposes, yang merupakan bagian dari pendekatan English for Specific Purposes guna menumbuhkan semangat kewirausahaan. Penelitian ini merupakan penelitian R&D (Reseach and Development/ penelitian dan pengembangan) yang langkah – langkahnya meliputi 2 tahap yang dilaksanakan dalam 2 tahun. Penelitian ini merupakan tahun pertama dari dua tahun yang diusulkan. Langkahlangkah dalam tahap pertama mencakup analisis kebutuhan, mengembangkan course grid, mengembangkan draftawal bahan ajar, penilaian draft produk oleh pakar (expert judgment), dan revisi produk. Analisis kebutuhan dilakukan menggunakan angket yang disebarkan kepada 100 mahasiswa berbagai jurusan di Fakultas Ekonomi (FE), Universitas Negeri Yoygakarta, yang mewakili kelompok bidang ilmu English for Business and Economics. Penelitian ini menghasilkan 2 jenis produk yaitu coursegrid dan bahan ajar yang berjudul English for Ocupational Purposes (EOP). Coursegrid terdiri dari komponen: judul unit, indikator, keterampilan berbahasa lisan (spoken cycle), keterampilan berbahasa tulis (written cycle) dan tips bisnis. Coursegrid ini kemudian dikembangkan menjadi bahan ajar. Jumlah unit yang dikembangkan adalah 6 unit. Setiap unit menyajikan 4 keterampilan pokok bahasa Inggris ( major English skills) yang dikategorisasi menjadi keterampilan berbahasa Inggris lisan (spoken cycle) dan tulisan (written cycle. Keterampilan berbahasa Inggris lisan mencakup mencakup keterampilan listening dan speaking sedangkan ketempilan tulis mencakup keterampilan reading dan writing. Fokus pembelajaran pada spoken cycle adalah language functions, spoken text dan kosakata sedang pada written cycle adalah pada written text, generic structure serta grammar. Bahan ajar ini juga disertai dengan informasi tentang tips dan informasi bisnis yang disajikan dalam bentuk teks sangat pendek. Di dalam bahan ajar ini terdapat beberapa karakter yang diinsersikan. Karakter tersebut adalah: Perseverance (ketekunan), discipline (kedisiplinan), honest (kejujuran), creative/innovative, (kreatif/innovative), positive thinking, (berfikiran positif), communicative (komunikatif) dan open minded (berfikiran terbuka). Karakter ini menjadi salah satu acauan dalam pemilihan materi dan organisasi kegiatan dalam bahan ajar EOP. Berdasarkan hasil evaluasi, draft terakhir ini termasuk kriteria sangat baik sehingga layak untuk dipergunakan

Lumbung Pustaka UNY (UNY Repository)

Machine Learning Algorithm for the Scansion of Old Saxon Poetry

Author: Alessandro Torcinovich
Gianluca Lebani
Irene Miani
Marina Buzzoni
Publication venue: place:Siena
Publication date: 01/01/2023
Field of study

Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Information security and assurance : Proceedings international conference, ISA 2012, Shanghai China, April 2012

Author
Publication venue: Science & Engineering Research Support Centre, (SERSC)
Publication date: 01/01/2012
Field of study

Deakin Research Online

INTERACT 2015 Adjunct Proceedings. 15th IFIP TC.13 International Conference on Human-Computer Interaction 14-18 September 2015, Bamberg, Germany

INTERACT is among the world’s top conferences in Human-Computer Interaction. Starting with the first INTERACT conference in 1990, this conference series has been organised under the aegis of the Technical Committee 13 on Human-Computer Interaction of the UNESCO International Federation for Information Processing (IFIP). This committee aims at developing the science and technology of the interaction between humans and computing devices. The 15th IFIP TC.13 International Conference on Human-Computer Interaction - INTERACT 2015 took place from 14 to 18 September 2015 in Bamberg, Germany. The theme of INTERACT 2015 was "Connection.Tradition.Innovation". This volume presents the Adjunct Proceedings - it contains the position papers for the students of the Doctoral Consortium as well as the position papers of the participants of the various workshops