24 research outputs found
Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier
One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout detection. Keywords: Feature Extraction, Document processing, Document Classification
Optimizing the AI Development Process by Providing the Best Support Environment
The purpose of this study is to investigate the development process for
Artificial inelegance (AI) and machine learning (ML) applications in order to
provide the best support environment. The main stages of ML are problem
understanding, data management, model building, model deployment and
maintenance. This project focuses on investigating the data management stage of
ML development and its obstacles as it is the most important stage of machine
learning development because the accuracy of the end model is relying on the
kind of data fed into the model. The biggest obstacle found on this stage was
the lack of sufficient data for model learning, especially in the fields where
data is confidential. This project aimed to build and develop a framework for
researchers and developers that can help solve the lack of sufficient data
during data management stage. The framework utilizes several data augmentation
techniques that can be used to generate new data from the original dataset
which can improve the overall performance of the ML applications by increasing
the quantity and quality of available data to feed the model with the best
possible data. The framework was built using python language to perform data
augmentation using deep learning advancements
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models
Text-to-Image synthesis is the task of generating an image according to a
specific text description. Generative Adversarial Networks have been considered
the standard method for image synthesis virtually since their introduction;
today, Denoising Diffusion Probabilistic Models are recently setting a new
baseline, with remarkable results in Text-to-Image synthesis, among other
fields. Aside its usefulness per se, it can also be particularly relevant as a
tool for data augmentation to aid training models for other document image
processing tasks. In this work, we present a latent diffusion-based method for
styled text-to-text-content-image generation on word-level. Our proposed method
manages to generate realistic word image samples from different writer styles,
by using class index styles and text content prompts without the need of
adversarial training, writer recognition, or text recognition. We gauge system
performance with Frechet Inception Distance, writer recognition accuracy, and
writer retrieval. We show that the proposed model produces samples that are
aesthetically pleasing, help boosting text recognition performance, and gets
similar writer retrieval score as real data
Anomaly Detection in Natural Scene Images Based on Enhanced Fine-Grained Saliency and Fuzzy Logic
This paper proposes a simple yet effective method for anomaly detection in natural scene images improving natural scene text detection and recognition. In the last decade, there has been significant progress towards text detection and recognition in natural scene images. However, in cases where there are logos, company symbols, or other decorative elements for text, existing methods do not perform well. This work considers such misclassified components, which are part of the text as anomalies, and presents a new idea for detecting such anomalies in the text for improving text detection and recognition in natural scene images. The proposed method considers the result of the existing text detection method as input for segmenting characters or components based on saliency map and rough set theory. For each segmented component, the proposed method extracts feature from the saliency map based on density, pixel distribution, and phase congruency to classify text and non-text components by exploring a fuzzy-based classifier. To verify the effectiveness of the method, we have performed experiments on several benchmark datasets of natural scene text detection, namely, MSRATD-500 and SVT. Experimental results show the efficacy of the proposed method over the existing ones for text detection and recognition in these datasets
Signature verification system based on multiple classifiers and multi fusion decision approach
With an increase in identity fraud and the emphasis on security, there is growing and urgent need to verify human identify efficiently. Signature and the handwriting verification application are used in many fields such as banking, public sectors. Documents and cheques verification system has triggered a real need for reliable, accurate and robust system. This work adopts different classification techniques between the local features based and the global features based of the signature system in addition to different fusion techniques between the outputs of the different classifiers and global features based to improve error rate of behavioral system. Main goal is to develop more accurate and robust signature verification system than the previous developed system with False Rejection Rate (FRR) equals to 5.3 and False Acceptance Rate (FAR) equals to 0. To achieve this goal, first multiple classification techniques are applied to the signature verification system which are artificial neural network, support vector machine and Pearson correlation and then these techniques are fused by applying two complicated fusion techniques which are fuzzy logic and sequential fuzzy logic and one simple fusion technique which is max voting. Lastly the rule-based decision is applied to specify whether the signature is genuine or not. Second, the improved signature verification system is extended with the high performance Hitachi system. This biometric based system can be realized in many real world and web based applications where there is a need for higher security and robust identification
Fusion of multi-classifiers for online signature verification using fuzzy logic inference
Compared to physiologically based biometric systems such as fingerprint, face, palm-vein and retina, behavioral based biometric systems such as signature, voice, gait, etc. are less popular and many of the research in these areas are still in their infancy. One of the reasons is due to the inconsistencies in human behavior which requires more robust algorithms in their developments. In this paper, an online signature verifi- cation system is proposed based on fuzzy logic inference. To ensure higher accuracy, the signature verification system is designed to include the fusion of multi classifiers, namely, the back propagation neural network algorithm and the Pearson correlation technique. A fuzzy logic inference engine is also designed to fuse two global features which are the time taken to sign and the length of the signature. The use of the fuzzy logic inference engine is to overcome the boundary limitations of fixed thresholds and overcome the uncertainties of thresholds for various users and to have a more human-like output. The system has been developed with a robust validation module based on Pearson’s correlation algorithm in which more consistent sets of signatures are enrolled. In this way, more consistent sets of training patterns are used for training. The results show that the incorporation of multi classifier fusion technique has improved the false rejection rate and false acceptance rate of the system as compared to the individual classifiers and the use of fuzzy logic inference module for the final decision helps to further improved the system performance
Online signature verification with neural networks classifier and fuzzy inference
Compared to physiologically based biometric systems such as fingerprint, face, palm-vein and retina, behavioral based biometric systems such as signature, voice, gait, etc. are less popular and many are still in their infancy. A major problem is due to inconsistencies in human behavior which require more robust algorithms in their developments. In this paper, an online signature verification system is proposed based on neural networks classifier and fuzzy inference. The software has been developed with a robust validation module based on Pearson's correlation algorithm in which more consistent sets of user's signature are enrolled. In this way, more consistent sets of training patterns are used to train the neural network modules based on the popular backpropagation algorithm. To increase the robustness not only the neural network threshold is used for the verification, the time and length of the signature are also calculated. A fuzzy inference module is then set up to infer the three thresholds for human-like decision outputs. The signature verification system shows better consistency and is more robust than previous designs
Study of AI-Driven Fashion Recommender Systems
The rising diversity, volume, and pace of fashion manufacturing pose a considerable challenge in the fashion industry, making it difficult for customers to pick which product to purchase. In addition, fashion is an inherently subjective, cultural notion and an ensemble of clothing items that maintains a coherent style. In most of the domains in which Recommender Systems are developed (e.g., movies, e-commerce, etc.), the similarity evaluation is considered for recommendation. Instead, in the Fashion domain, compatibility is a critical factor. In addition, raw visual features belonging to product representations that contribute to most of the algorithm’s performances in the Fashion domain are distinguishable from the metadata of the products in other domains. This literature review summarizes various Artificial Intelligence (AI) techniques that have lately been used in recommender systems for the fashion industry. AI enables higher-quality recommendations than earlier approaches. This has ushered in a new age for recommender systems, allowing for deeper insights into user-item relationships and representations and the discovery patterns in demographical, textual, virtual, and contextual data. This work seeks to give a deeper understanding of the fashion recommender system domain by performing a comprehensive literature study of research on this topic in the past 10 years, focusing on image-based fashion recommender systems taking AI improvements into account. The nuanced conceptions of this domain and their relevance have been developed to justify fashion domain-specific characteristics.Validerad;2023;Nivå 1;2023-08-10 (joosat);Licens fulltext: CC BY License</p
A survey of historical document image datasets
This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms. However, because of the very large variety of the actual data (e.g., scripts, tasks, dates, support systems, and amount of deterioration), the different formats for data and label representation, and the different evaluation processes and benchmarks, finding appropriate datasets is a difficult task. This work fills this gap, presenting a meta-study on existing datasets. After a systematic selection process (according to PRISMA guidelines), we select 65 studies that are chosen based on different factors, such as the year of publication, number of methods implemented in the article, reliability of the chosen algorithms, dataset size, and journal outlet. We summarize each study by assigning it to one of three pre-defined tasks: document classification, layout structure, or content analysis. We present the statistics, document type, language, tasks, input visual aspects, and ground truth information for every dataset. In addition, we provide the benchmark tasks and results from these papers or recent competitions. We further discuss gaps and challenges in this domain. We advocate for providing conversion tools to common formats (e.g., COCO format for computer vision tasks) and always providing a set of evaluation metrics, instead of just one, to make results comparable across studies.Validerad;2022;Nivå 2;2022-12-01 (marisr)</p
A Survey of Historical Document Image Datasets
This paper presents a systematic literature review of image datasets for
document image analysis, focusing on historical documents, such as handwritten
manuscripts and early prints. Finding appropriate datasets for historical
document analysis is a crucial prerequisite to facilitate research using
different machine learning algorithms. However, because of the very large
variety of the actual data (e.g., scripts, tasks, dates, support systems, and
amount of deterioration), the different formats for data and label
representation, and the different evaluation processes and benchmarks, finding
appropriate datasets is a difficult task. This work fills this gap, presenting
a meta-study on existing datasets. After a systematic selection process
(according to PRISMA guidelines), we select 56 studies that are chosen based on
different factors, such as the year of publication, number of methods
implemented in the article, reliability of the chosen algorithms, dataset size,
and journal outlet. We summarize each study by assigning it to one of three
pre-defined tasks: document classification, layout structure, or semantic
analysis. We present the statistics, document type, language, tasks, input
visual aspects, and ground truth information for every dataset. In addition, we
provide the benchmark tasks and results from these papers or recent
competitions. We further discuss gaps and challenges in this domain. We
advocate for providing conversion tools to common formats (e.g., COCO format
for computer vision tasks) and always providing a set of evaluation metrics,
instead of just one, to make results comparable across studies.Comment: 37 pages, 2 figure