Search CORE

148 research outputs found

Handwriting styles: benchmarks and evaluation metrics

Author: Bailly Gerard
Mohammed Omar
Pellier Damien
Publication venue
Publication date: 04/09/2018
Field of study

Evaluating the style of handwriting generation is a challenging problem, since it is not well defined. It is a key component in order to develop in developing systems with more personalized experiences with humans. In this paper, we propose baseline benchmarks, in order to set anchors to estimate the relative quality of different handwriting style methods. This will be done using deep learning techniques, which have shown remarkable results in different machine learning tasks, learning classification, regression, and most relevant to our work, generating temporal sequences. We discuss the challenges associated with evaluating our methods, which is related to evaluation of generative models in general. We then propose evaluation metrics, which we find relevant to this problem, and we discuss how we evaluate the evaluation metrics. In this study, we use IRON-OFF dataset. To the best of our knowledge, there is no work done before in generating handwriting (either in terms of methodology or the performance metrics), our in exploring styles using this dataset.Comment: Submitted to IEEE International Workshop on Deep and Transfer Learning (DTL 2018

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

A New Approach to Synthetic Image Evaluation

Author: Memari Majid
Publication venue: OpenSIUC
Publication date: 01/12/2023
Field of study

This study is dedicated to enhancing the effectiveness of Optical Character Recognition (OCR) systems, with a special emphasis on Arabic handwritten digit recognition. The choice to focus on Arabic handwritten digits is twofold: first, there has been relatively less research conducted in this area compared to its English counterparts; second, the recognition of Arabic handwritten digits presents more challenges due to the inherent similarities between different Arabic digits.OCR systems, engineered to decipher both printed and handwritten text, often face difficulties in accurately identifying low-quality or distorted handwritten text. The quality of the input image and the complexity of the text significantly influence their performance. However, data augmentation strategies can notably improve these systems\u27 performance. These strategies generate new images that closely resemble the original ones, albeit with minor variations, thereby enriching the model\u27s learning and enhancing its adaptability. The research found Conditional Variational Autoencoders (C-VAE) and Conditional Generative Adversarial Networks (C-GAN) to be particularly effective in this context. These two generative models stand out due to their superior image generation and feature extraction capabilities. A significant contribution of the study has been the formulation of the Synthetic Image Evaluation Procedure, a systematic approach designed to evaluate and amplify the generative models\u27 image generation abilities. This procedure facilitates the extraction of meaningful features, computation of the Fréchet Inception Distance (LFID) score, and supports hyper-parameter optimization and model modifications

OpenSIUC

Obtaining n best alternatives for classifying Unicode symbols

Author: Vieco Pérez Jesús
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/09/2017
Field of study

The Unicode character set has been increased in last years until grouping more than 100000 characters. We developed a classifier which can predict the n most probable solutions to a given handwritten character in a smaller Unicode set. Even with the size reduction we still have a classification problem with a big number of classes (5488 in total) without any training sample. Before dealing with this problem we performed some experiments on the UJI PEN dataset. In these experiments we used two different data generation techniques, distortions and variational autoencoders as generative models. We tried feature extraction methods with both offline and online data. The generation along with the feature extraction was tested in several models of neural networks like convolutional networks or LSTM.El conjunto de caracteres Unicode se ha incrementado en los últimos años hasta llegar a agrupar más de 100000 caracteres. Hemos desarrollado un clasificador que puede predecir las n clases más probables de un carácter escrito a mano perteneciente a un conjunto más pequeño de caracteres Unicode. Incluso con la reducción de tamaño todavía tenemos un problema de clasificación con muchas clases (5488 en total) sin ninguna muestra de entrenamiento. Antes de tratar este problema hemos realizado algunos experimentos con el corpus UJI PEN. En estos experimentos hemos utilizado dos técnicas de generación de datos, distorsiones y el uso devariational autoencoders como modelos generativos. Hemos probado diferentes métodos de extracción de características tanto con datos offline como con datos online. La generación y la extracción de características han sido probadas en diferentes modelos de redes neuronales como las redes convolucionales o las LSTM.Vieco Pérez, J. (2017). Obtención de las n mejores alternativas para clasificación de símbolos unicode. http://hdl.handle.net/10251/86238TFG

RiuNet

Unsupervised feature learning for writer identification

Author: Pallàs Arranz Eduard
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

Our work presents a research on unsupervised feature learning methods for writer identification and retrieval. We want to study the impact of deep learning alternatives in this field by proposing methodologies which explore different uses of autoencoder networks. Taking a patch extraction algorithm as a starting point, we aim to obtain characteristics from patches of handwritten documents in an unsupervised way, meaning no label information is used for the task. To prove if the extraction of features is valid for writer identification, the approaches we propose are evaluated and compared with state-of-the-art methods on the ICDAR2013 and ICDAR2017 datasets for writer identification

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Deep Learning Methods for Document Image Understanding

Author: Capobianco Samuele
Publication venue
Publication date: 01/01/2020
Field of study

Florence Research

A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU

Author: Mohamed Raihani
Mustapha Norwati
Perumal Thinagaran
Shiri Farhad Mortezapour
Publication venue
Publication date: 27/05/2023
Field of study

Deep learning (DL) has emerged as a powerful subset of machine learning (ML) and artificial intelligence (AI), outperforming traditional ML methods, especially in handling unstructured and large datasets. Its impact spans across various domains, including speech recognition, healthcare, autonomous vehicles, cybersecurity, predictive analytics, and more. However, the complexity and dynamic nature of real-world problems present challenges in designing effective deep learning models. Consequently, several deep learning models have been developed to address different problems and applications. In this article, we conduct a comprehensive survey of various deep learning models, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer Learning. We examine the structure, applications, benefits, and limitations of each model. Furthermore, we perform an analysis using three publicly available datasets: IMDB, ARAS, and Fruit-360. We compare the performance of six renowned deep learning models: CNN, Simple RNN, Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional GRU.Comment: 16 pages, 29 figure

arXiv.org e-Print Archive

Associate Latent Encodings in Learning from Demonstrations

Author: Billard Aude
Melo Francisco S
Paiva Ana
Yin Hang
Publication venue
Publication date: 04/01/2017
Field of study

We contribute a learning from demonstration approach for robots to acquire skills from multi-modal high-dimensional data. Both latent representations and associations of different modalities are proposed to be jointly learned through an adapted variational auto-encoder. The implementation and results are demonstrated in a robotic handwriting scenario, where the visual sensory input and the arm joint writing motion are learned and coupled. We show the latent representations successfully construct a task manifold for the observed sensor modalities. Moreover, the learned associations can be exploited to directly synthesize arm joint handwriting motion from an image input in an end-to-end manner. The advantages of learning associative latent encodings are further highlighted with the examples of inferring upon incomplete input images. A comparison with alternative methods demonstrates the superiority of the present approach in these challenging tasks

Infoscience - École polytechnique fédérale de Lausanne

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Density Models via Structured Latent Variables

Author: Yang X
Publication venue
Publication date
Field of study

As one principal approach to machine learning and cognitive science, the probabilistic framework has been continuously developed both theoretically and practically. Learning a probabilistic model can be thought of as inferring plausible models to explain observed data. The learning process exploits random variables as building blocks which are held together with probabilistic relationships. The key idea behind latent variable models is to introduce latent variables as powerful attributes (setting/instrument) to reveal data structures and explore underlying features which can sensitively describe the real-world data. The classical research approaches engage shallow architectures, including latent feature models and finite mixtures of latent variable models. Within the classical frameworks, we should make certain assumptions about the form, structure, and distribution of the data. Since the shallow form may not describe the data structures sufficiently, new types of latent structures are promptly developed with the probabilistic frameworks. In this line, three main research interests are sparked, including infinite latent feature models, mixtures of the mixture models, and deep models. This dissertation summarises our work which is advancing the state-of-the-art in both classical and emerging areas. In the first block, a finite latent variable model with the parametric priors is presented for clustering and is further extended into a two-layer mixture model for discrimination. These models embed the dimensionality reduction in their learning tasks by designing a latent structure called common loading. Referred to as the joint learning models, these models attain more appropriate low-dimensional space that better matches the learning task. Meanwhile, the parameters are optimised simultaneously for both the low-dimensional space and model learning. However, these joint learning models must assume the fixed number of features as well as mixtures, which are normally tuned and searched using a trial and error approach. In general, the simpler inference can be performed by fixing more parameters. However, the fixed parameters will limit the flexibility of models, and false assumptions could even derive incorrect inferences from the data. Thus, a richer model is allowed for reducing the number of assumptions. Therefore an infinite tri-factorisation structure is proposed with non-parametric priors in the second block. This model can automatically determine an optimal number of features and leverage the interrelation between data and features. In the final block, we introduce how to promote the shallow latent structures model to deep structures to handle the richer structured data. This part includes two tasks: one is a layer-wise-based model, another is a deep autoencoder-based model. In a deep density model, the knowledge of cognitive agents can be modelled using more complex probability distributions. At the same time, inference and parameter computation procedure are straightforward by using a greedy layer-wise algorithm. The deep autoencoder-based joint learning model is trained in an end-to-end fashion which does not require pre-training of the autoencoder network. Also, it can be optimised by standard backpropagation without the inference of maximum a posteriori. Deep generative models are much more efficient than their shallow architectures for unsupervised and supervised density learning tasks. Furthermore, they can also be developed and used in various practical applications

University of Liverpool Repository