1,352 research outputs found

    HAMEX - a Handwritten and Audio Dataset of Mathematical Expressions

    Get PDF
    International audienceIn this paper, we present HAMEX, a new public dataset that contains mathematical expressions available in their on-line handwritten form and in their audio spoken form. We have designed this dataset so that, given a mathematical expression, its handwritten signal and its audio signal can be used jointly to design multimodal recognition systems. Here, we describe the different steps that allowed us to acquire this dataset, from the creation of the mathematical expression corpora (including expressions from Wikipedia pages) to the segmentation and the transcription of the collected data, via the data collection process itself. Currently, the dataset contains 4 350 on-line handwritten mathematical expressions written by 58 writers, and the corresponding audio expressions (in French) spoken by 58 speakers. The ground truth is also provided both for the handwritten expressions (as INKML files with the digital ink, the symbol segmentation, and the MATHML structure) and for the audio expressions (as XML files with the transcriptions of the spoken expressions)

    Late multimodal fusion for image and audio music transcription

    Get PDF
    Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios–in which the corresponding single-modality models yield different error rates–showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project, funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding programme (IDIFEDER/2020/003). The first and second authors are respectively supported by grants FPU19/04957 from the Spanish Ministerio de Universidades and APOSTD/2020/256 from Generalitat Valenciana

    Understanding Optical Music Recognition

    Get PDF
    For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords

    Classification and Verification of Online Handwritten Signatures with Time Causal Information Theory Quantifiers

    Get PDF
    We present a new approach for online handwritten signature classification and verification based on descriptors stemming from Information Theory. The proposal uses the Shannon Entropy, the Statistical Complexity, and the Fisher Information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results produced surpass state-of-the-art techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups.Comment: Submitted to PLOS On

    Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

    Full text link
    This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large Multimodal Model (LMM). We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document. The evaluation reveals that GPT-4V performs well in recognizing and understanding Latin contents, but struggles with multilingual scenarios and complex tasks. Specifically, it showed limitations when dealing with non-Latin languages and complex tasks such as handwriting mathematical expression recognition, table structure recognition, and end-to-end semantic entity recognition and pair extraction from document image. Based on these observations, we affirm the necessity and continued research value of specialized OCR models. In general, despite its versatility in handling diverse OCR tasks, GPT-4V does not outperform existing state-of-the-art OCR models. How to fully utilize pre-trained general-purpose LMMs such as GPT-4V for OCR downstream tasks remains an open problem. The study offers a critical reference for future research in OCR with LMMs. Evaluation pipeline and results are available at https://github.com/SCUT-DLVCLab/GPT-4V_OCR

    What is Computational Intelligence and where is it going?

    Get PDF
    What is Computational Intelligence (CI) and what are its relations with Artificial Intelligence (AI)? A brief survey of the scope of CI journals and books with ``computational intelligence'' in their title shows that at present it is an umbrella for three core technologies (neural, fuzzy and evolutionary), their applications, and selected fashionable pattern recognition methods. At present CI has no comprehensive foundations and is more a bag of tricks than a solid branch of science. The change of focus from methods to challenging problems is advocated, with CI defined as a part of computer and engineering sciences devoted to solution of non-algoritmizable problems. In this view AI is a part of CI focused on problems related to higher cognitive functions, while the rest of the CI community works on problems related to perception and control, or lower cognitive functions. Grand challenges on both sides of this spectrum are addressed

    A statistical multiresolution approach for face recognition using structural hidden Markov models

    Get PDF
    This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy

    Off-line handwritten signature recognition by wavelet entropy and neural network

    Get PDF
    Handwritten signatures are widely utilized as a form of personal recognition. However, they have the unfortunate shortcoming of being easily abused by those who would fake the identification or intent of an individual which might be very harmful. Therefore, the need for an automatic signature recognition system is crucial. In this paper, a signature recognition approach based on a probabilistic neural network (PNN) and wavelet transform average framing entropy (AFE) is proposed. The system was tested with a wavelet packet (WP) entropy denoted as a WP entropy neural network system (WPENN) and with a discrete wavelet transform (DWT) entropy denoted as a DWT entropy neural network system (DWENN). Our investigation was conducted over several wavelet families and different entropy types. Identification tasks, as well as verification tasks, were investigated for a comprehensive signature system study. Several other methods used in the literature were considered for comparison. Two databases were used for algorithm testing. The best recognition rate result was achieved by WPENN whereby the threshold entropy reached 92%

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline
    corecore