Search CORE

180 research outputs found

Recommended from our members

A high level approach to Arabic sentence recognition

Author: Krayem AG
Publication venue
Publication date: 01/09/2013
Field of study

The aim of this work is to develop sentence recognition system inspired by the human reading process. Cognitive studies observed that the human tended to read a word as a whole at a time. He considers the global word shapes and uses contextual knowledge to infer and discriminate a word among other possible words. The sentence recognition system is a fully integrated system; a word level recogniser (baseline system) integrated with linguistic knowledge post-processing module. The presented baseline system is holistic word-based recognition approach characterised as probabilistic ranked task. The output of the system is multiple recognition hypotheses (N-best word lattice). The basic unit is the word rather than the character; it does not rely on any segmentation or require baseline detection. The considered linguistic knowledge to re-rank the output of the existing baseline system is the standard n-gram Statistical Language Models (SLMs). The candidates are re-ranked through exploiting phrase perplexity score. The system is an OCR system that depends on HMM models utilizing the HTK Toolkit. The baseline system supported by global transformation features extracted from binary word images. The adopted features' extraction technique is the block-based Discrete Cosine Transform (DCT) applied to the whole word image. Feature vectors extracted using block-based DCT with non-overlapping sub-block of size 8x8 pixels. The applied HMMs to the task are mono-model discrete one-dimensional HMMs (Bakis Model). A balanced actual scanned and synthetic database of word-image has been constructed to ensure an even distribution of word samples. The Arabic words are typewritten in five fonts having a size 14 points in a plain style. The statistical language models and lexicon words are extracted from The Holy Qur‟an. The systems are applied on word images with no overlap between the training and testing datasets. The actual scanned database is used to evaluate the word recogniser. The synthetic database is a large amount of data acquired for a reliable training of sentence recognition systems. This word recogniser evaluated in mono-font and multi-font contexts. The two types of word recogniser have been used to achieve a final recognition accuracy of99.30% and 73.47% in mono-font and multi-font, respectively. The achieved average accuracy by the sentence recogniser is 67.24% improved to 78.35% on average when using 5-gram post-processing. The complexity and accuracy of the post-processing module are evaluated and found that 4-gram is more suitable than 5-gram; it is much faster at an average improvement of 76.89%

Nottingham Trent Institutional Repository (IRep)

A generic approach for desining on-line handwritten shapes recognizers

Author: ARTIERES T.
GALLINARI P.
MARUKATAT S.
Publication venue: GRETSI, Saint Martin d'Hères, France
Publication date: 01/01/2005
Field of study

This paper presents a generic approach for designing on-line handwritten shapes recognizers. Our approach allows designing very different recognition engines that correspond to various needs in pen-based interfaces. In particular, it allows dealing with a wide class of symbols and characters. We present in detail our system and make the link between our models and more standard statistical models such as Hierarchical Hidden Markov Models and Dynamic Bayesian Networks. We then evaluate fundamental properties of our approach: learning from scratch any symbol, learning from very few training sample. We show experimentally that, using our approach, one can learn both a state-of-the-art writerindependent recognizer for alphanumeric characters, and a writer-dependent recognizer working with any twodimensional shapes that learns a new symbol with a few training samples and requires very few machines resources.Dans ce papier, nous présentons une approche générique pour le développement de moteurs de reconnaissance de symboles manuscrits en ligne. Cette approche permet de concevoir des systèmes de reconnaissance de types très variés correspondant à différents contextes des interfaces stylo, pouvant notamment fonctionner sur diverses classes de caractères ou symboles. Nous présentons en détail notre approche et faisons le lien avec d’une part les modèles de Markov hiérarchiques et d’autre part les réseaux bayésiens dynamiques. Nous évaluons ensuite les propriétés fondamentales de notre approche qui lui confèrent une grande flexibilité. Puis nous montrons que l’on peut, avec cette approche générique, concevoir aussi bien des systèmes omni-scripteur rivalisant avec les meilleurs systèmes actuels sur des caractères alphanumériques usuels, que des systèmes mono-scripteur pour des symboles graphiques quelconques, nécessitant très peu d’exemples d’apprentissage et peu gourmands en ressources machine

I-Revues

Deep Learning Techniques for Music Generation -- A Survey

Author: Briot Jean-Pierre
Hadjeres Gaëtan
Pachet François-David
Publication venue
Publication date: 23/03/2019
Field of study

This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

arXiv.org e-Print Archive

Recommended from our members

Advances in statistical script learning

Author: Pichotta Karl
Publication venue
Publication date: 05/02/2018
Field of study

When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence ``Mrs. Dalloway said she would buy the flowers herself,'' one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on. Observing this, it is clear that many different useful natural language end-tasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highly-probable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models. To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling multiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents. Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model. Third, we investigate the performance of a sequence-to-sequence encoder-decoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis. Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.Computer Science

Texas ScholarWorks

Recognition of mathematical handwriting on whiteboards

Author: Sabeghi Saroui Behrang
Publication venue
Publication date: 01/12/2015
Field of study

Automatic recognition of handwritten mathematics has enjoyed significant improvements in the past decades. In particular, online recognition of mathematical formulae has seen a number of important advancements. However, in reality most mathematics is still taught and developed on regular whiteboards and offline recognition remains an open and challenging task in this area. In this thesis we develop methods to recognise mathematics from static images of handwritten expressions on whiteboards, while leveraging the strength of online recognition systems by transforming offline data into online information. Our approach is based on trajectory recovery techniques, that allow us to reconstruct the actual stroke information necessary for online recognition. To this end we develop a novel recognition process especially designed to deal with whiteboards by prudently extracting information from colour images. To evaluate our methods we use an online recogniser for the recognition task, which is specifically trained for recognition of maths symbols. We present our experiments with varying quality and sources of images. In particular, we have used our approach successfully in a set of experiments using Google Glass for capturing images from whiteboards, in which we achieve highest accuracies of 88.03% and 84.54% for segmentation and recognition of mathematical symbols respectively

University of Birmingham Research Archive, E-theses Repository

Archives, Access and Artificial Intelligence: Working with Born-Digital and Digitized Archival Collections

Author
Publication venue: 'Transcript Verlag'
Publication date: 01/01/2022
Field of study

Digital archives are transforming the Humanities and the Sciences. Digitized collections of newspapers and books have pushed scholars to develop new, data-rich methods. Born-digital archives are now better preserved and managed thanks to the development of open-access and commercial software. Digital Humanities have moved from the fringe to the center of academia. Yet, the path from the appraisal of records to their analysis is far from smooth. This book explores crossovers between various disciplines to improve the discoverability, accessibility, and use of born-digital archives and other cultural assets

SSOAR - Social Science Open Access Repository

Archives, Access and Artificial Intelligence

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 05/05/2022
Field of study

Directory of Open Access Books (DOAB)

Archives, Access and Artificial Intelligence

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Towards Lifelong Reasoning with Sparse and Compressive Memory Systems

Author: Rae Jack William
Publication venue: UCL (University College London)
Publication date: 28/04/2021
Field of study

Humans have a remarkable ability to remember information over long time horizons. When reading a book, we build up a compressed representation of the past narrative, such as the characters and events that have built up the story so far. We can do this even if they are separated by thousands of words from the current text, or long stretches of time between readings. During our life, we build up and retain memories that tell us where we live, what we have experienced, and who we are. Adding memory to artificial neural networks has been transformative in machine learning, allowing models to extract structure from temporal data, and more accurately model the future. However the capacity for long-range reasoning in current memory-augmented neural networks is considerably limited, in comparison to humans, despite the access to powerful modern computers. This thesis explores two prominent approaches towards scaling artificial memories to lifelong capacity: sparse access and compressive memory structures. With sparse access, the inspection, retrieval, and updating of only a very small subset of pertinent memory is considered. It is found that sparse memory access is beneficial for learning, allowing for improved data-efficiency and improved generalisation. From a computational perspective - sparsity allows scaling to memories with millions of entities on a simple CPU-based machine. It is shown that memory systems that compress the past to a smaller set of representations reduce redundancy and can speed up the learning of rare classes and improve upon classical data-structures in database systems. Compressive memory architectures are also devised for sequence prediction tasks and are observed to significantly increase the state-of-the-art in modelling natural language

UCL Discovery

Flavor text generation for role-playing video games

Author: van Stegeren Judith
Publication venue: University of Twente
Publication date: 25/03/2022
Field of study

University of Twente Research Information