10 research outputs found

    Named Entity Recognition by Neural Prediction

    Get PDF
    International audienceNamed entity recognition (NER) remains a very challenging problem essentially when the document, where we perform it, is handwritten and ancient. Traditional methods using regular expressions or those based on syntactic rules, work but are not generic because they require, for each dataset, additional work of adaptation. We propose here a recognition method by context exploitation and tag prediction. We use a pipeline model composed of two consecutive BLSTMs (Bidirectional Long-Short Term Memory). The first one is a BLSTM-CTC coupling to recognize the words in a text line using a sliding window and HOG features. The second BLSTM serves as a language model. It cleverly exploits the gates of the BLSTM memory cell by deploying some syntactic rules in order to store the content around the proper nouns. This operation allows it to predict the tag of the next word, depending on its context, which is followed gradually until the discovery of the named entity (NE). All the words of the context are used to help the prediction. We have tested this system on a private dataset of Philharmonie de Paris, for the extraction of proper nouns within sale music transactions as well as on the public IAM dataset. The results are satisfactory, compared to what exists in the literature

    Détection de mots clés et d'expressions régulières en vue de la reconnaissance d'entités nommées dans des documents manuscrits

    Get PDF
    This document presents a study on keyword and regular expression detection in handwritten documents, dedicated to a further named entity detection stage. Named entities such as name, surname, company name or numerical values often constitutes the main informative part of a document. Therefore, their detection may lead to a deep document understanding. Named entity detection is a difficult problem due to their variability, even on electronical texts. When dealing with image of handwritten documents, the problem is also faced with the recognition issue: intrinsic handwriting variability, noise, etc.The forst contribution of this manuscript is a handwriting recognition engine based on CRF. The second contribution is a generic word and regular expression spotting system. a benchmark of discriminative models is proposed, showing that the BLSTM-CTC clearly outperforms other hybrid methods.Les travaux présentés dans cette thèse concernent la détection de mots clés et d’expressions régulières en vue de la reconnaissance d’entités nommées dans des documents manuscrits non contraints. Les entités nommées telles que les noms et prénoms, les noms de compagnies ou les montants numériques constituent généralement une majeure partie de l’information d’un document. D’un point de vue industriel, la détection et la reconnaissance de ces entités nommées permettrait donc d’avoir une compréhension profonde du document traité. Les entités nommées sont des informations très variables, dont la définition dépend fortement du problème considéré. Les entités nommées liées à une problématique de tri du courier (nom et prénom de personne, type et nom de voie, nom de ville, code postal) sont par exemple différentes de celles liées à un problème de catégorisation de documents (lexique de mots clefs liés au domaine). Cette variabilité rend la détection des entitées nommées difficile. Lorsque l’on considère des images de documents, la détection et la reconnaissance des entités nommées est également confrontée à la problématiquede reconnaissance du texte, perturbée par la variablité de l’écriture (notamment sur les documents manuscrits), ainsi qu’au bruit lié à la numérisation.La première contribution de cette thèse est un système de reconnaissance de mots isolés basé sur un Champs Aléatoire Conditionnel (CAC), ce qui d’après notre bibliographie n’a pas encore été proposé. La deuxième contribution est un système générique de détection de mots clés et d’expressions régulières permettant de détecter n’importe quelle séquence dans une ligne de texte. Une structure se démarque des autres par ses performances etsa capacité à traiter des requêtes très difficiles, le BLSTM-CTC. Cette dernière semble être la clé de la résolution du problème initial

    Reconnaissance de l’écriture manuscrite avec des réseaux récurrents

    Get PDF
    Mass digitization of paper documents requires highly efficient optical cha-racter recognition systems. Digital versions of paper documents enable the useof search engines through keyword dectection or the extraction of high levelinformation (e.g. : titles, author, dates). Unfortunately writing recognition sys-tems and especially handwriting recognition systems are still far from havingsimilar performance to that of a human being on the most difficult documents.This industrial PhD (CIFRE) between Airbus DS and the LITIS, that tookplace within the MAURDOR project time frame, aims to seek out and improvethe state of the art systems for handwriting recognition.We compare different systems for handwriting recognition. Our compa-risons include various feature sets as well as various dynamic classifiers : i)Hidden Markov Models, ii) hybrid neural network/HMM, iii) hybrid recurrentnetwork Bidirectional Long Short Term Memory - Connectionist TemporalClassification (BLSTM-CTC)/MMC, iv) a hybrid Conditional Random Fields(CRF)/HMM. We compared these results within the framework of the WR2task of the ICDAR 2009 competition, namely a word recognition task usinga 1600 word lexicon. Our results rank the BLSTM-CTC/HMM system as themost performant, as well as clearly showing that BLSTM-CTCs trained ondifferent features are complementary.Our second contribution aims at using this complementary. We explorevarious combination strategies that take place at different levels of the BLSTM-CTC architecture : low level (early fusion), mid level (within the network),high level (late integration). Here again we measure the performances of theWR2 task of the ICDAR 2009 competition. Overall our results show thatour different combination strategies improve on the single feature systems,moreover our best combination results are close to that of the state of theart system on the same task. On top of that we have observed that some ofour combinations are more adapted for systems using a lexicon to correct amistake, while other are better suited for systems with no lexicon.Our third contribution is focused on tasks related to handwriting recognition. We present two systems, one designed for language recognition, theother one for keyword detection, either from a text query or an image query.For these two tasks our systems stand out from the literature since they usea handwriting recognition step. Indeed most literature systems focus on extracting image features for classification or comparison, wich does not seemappropriate given the tasks. Our systems use a handwriting recognition stepfollowed either by a language detection step or a word detection step, depending on the application.La numérisation massive de documents papier a fait apparaître le besoin d’avoir des systèmes de reconnaissance de l’écriture extrêmement performants. La numérisation de ces documents permet d’effectuer des opérations telles que des recherches de mots clefs ou l’extraction d’informations de haut niveau (titre, auteur, adresses, et.). Cependant la reconnaissance de l’écriture et en particulier l’écriture manuscrite ne sont pas encore au niveau de performance de l’homme sur des documents complexes, ce qui restreint ou nuit à certaines applications. Cette thèse CIFRE entre Airbus DS et le LITIS, dans le cadre du projet MAURDOR, a pour but de mettre en avant et d’améliorer les méthodes état de l’art dans le domaine de la reconnaissance de l’écriture manuscrite. Nos travaux comparent différents systèmes permettant d’effectuer la reconnaissance de l’écriture manuscrite. Nous comparons en particulier différentes caractéristiques et différents classifieurs dynamiques : i) Modèles de Markov Cachés (MMC), ii) hybride réseaux de neurones/MMC, iii) hybride réseaux récurrents « Bidirectional Long Short Term Memory - Connectionist Temporal Classification » (BLSTM-CTC)/MMC et iv) hybride Champs Aléatoires Conditionnels (CAC)/MMC. Les comparaisons sont réalisées dans les conditions de la tâche WR2 de la compétition ICDAR 2009, c’est à dire une tâche de reconnaissance de mots isolés avec un dictionnaire de 1600 mots. Nous montrons la supériorité de l’hybride BLSTM-CTC/MMC sur les autres classifieurs dynamiques ainsi que la complémentarité des sorties des BLSTM-CTC utilisant différentes caractéristiques.Notre seconde contribution vise à exploiter ces complémentarités. Nous explorons des stratégies de combinaisons opérant à différents niveaux de la structure des BLSTM-CTC : bas niveau (en entrée), moyen niveau (dans le réseau), haut niveau (en sortie). Nous nous plaçons de nouveau dans les conditions de la tâche WR2 de la compétition ICDAR 2009. De manière générale nos combinaisons améliorent les résultats par rapport aux systèmes individuels, et nous avoisinons les performances du meilleur système de la compétition. Nous avons observé que certaines combinaisons sont adaptées à des systèmes sans lexique tandis que d’autres sont plus appropriées pour des systèmes avec lexique. Notre troisième contribution se situe sur deux applications liées à la reconnaissance de l’écriture. Nous présentons un système de reconnaissance de la langue ainsi qu’un système de détection de mots clefs, à partir de requêtes images et de requêtes de texte. Dans ces deux applications nous présentons une approche originale faisant appel à la reconnaissance de l’écriture. En effet la plupart des systèmes de la littérature extraient des caractéristiques des image pour déterminer une langue ou trouver des images similaires, ce qui n’est pas nécessairement l’approche la plus adaptée au problème à traiter. Nos approches se basent sur une phase de reconnaissance de l’écriture puis une analyse du texte afin de déterminer la langue ou de détecter un mot clef recherché

    Out of vocabulary queries for word graph-based keyword spotting

    Full text link
    [EN] In this master thesis several approaches are presented to support out of vocabulary queries in a Word Graph (WG)-based Keyword Spotting (KWS) application for handwritten text lines. Generally, KWS assigns a score that estimates how likely is that a given keyword is present in a certain line image. WGbased KWS offers very fast search times but assumes a closed vocabulary and assigns null scores to any word not included in such vocabulary. This work tries to provide to the WG-based KWS the flexibility of non-restricted searches and the speed achieved by the usage of WG.[ES] En este trabajo fin de máster se presentan distintas alternativas para dar soporte a búsquedas con palabras fuera del vocabulario en Keyword Spotting (KWS) sobre líneas de texto manuscrito usando Word Graphs (WG). En general, en KWS se asigna una puntuación que indica cuán probable es que una palabra aparezca en una imagen de una línea de texto. El KWS basado en WG ofrece tiempos de búsqueda muy rápidos pero asume un vocabulario cerrado y asigna puntuaciones nulas a las palabras no incluidas en él. Con éste trabajo se pretende proporcionar al KWS basado en WG de la flexibilidad de búsquedas no restringidas al vocabulario de entrenamiento, junto a la velocidad que se consigue con el uso de WG.Puigcerver I Pérez, J. (2014). Out of vocabulary queries for word graph-based keyword spotting. http://hdl.handle.net/10251/53360Archivo delegad

    Feature design and lexicon reduction for efficient offline handwriting recognition

    Get PDF
    This thesis establishes a pattern recognition framework for offline word recognition systems. It focuses on the image level features because they greatly influence the recognition performance. In particular, we consider two complementary aspects of prominent features impact: lexicon reduction and the actual recognition. The first aspect, lexicon reduction, consists in the design of a weak classifier which outputs a set of candidate word hypotheses given a word image. Its main purpose is to reduce the recognition computational time while maintaining (or even improving) the recognition rate. The second aspect is the actual recognition system itself. In fact, several features exist in the literature based on different fields of research, but no consensus exists concerning the most promising ones. The goal of the proposed framework is to improve our understanding of relevant features in order to build better recognition systems. For this purpose, we addressed two specific problems: 1) feature design for lexicon reduction (application to Arabic script), and 2) feature evaluation for cursive handwriting recognition (application to Latin and Arabic scripts). Few methods exist for lexicon reduction in Arabic script, unlike Latin script. Existing methods use salient features of Arabic words such as the number of subwords and diacritics, but totally ignore the shape of the subwords. Therefore, our first goal is to perform lexicon reductionn based on subwords shape. Our approach is based on shape indexing, where the shape of a query subword is compared to a labeled database of sample subwords. For efficient comparison with a low computational overhead, we proposed the weighted topological signature vector (W-TSV) framework, where the subword shape is modeled as a weighted directed acyclic graph (DAG) from which the W-TSV vector is extracted for efficient indexing. The main contributions of this work are to extend the existing TSV framework to weighted DAG and to propose a shape indexing approach for lexicon reduction. Good performance for lexicon reduction is achieved for Arabic subwords. Nevertheless, the performance remains modest for Arabic words. Considering the results of our first work on Arabic lexicon reduction, we propose to build a new index for better performance at the word level. The subword shape and the number of subwords and diacritics are all important components of Arabic word shape. We therefore propose the Arabic word descriptor (AWD) which integrates all the aforementioned components. It is built in two steps. First, a structural descriptor (SD) is computed for each connected component (CC) of the word image. It describes the CC shape using the bag-of-words model, where each visual word represents a different local shape structure. Then, the AWD is formed by concatenating the SDs using an efficient heuristic, implicitly discriminating between subwords and diacritics. In the context of lexicon reduction, the AWD is used to index a reference database. The main contribution of this work is the design of the AWD, which integrates lowlevel cues (subword shape structure) and symbolic information (subword counts and diacritics) into a single descriptor. The proposed method has a low computational overhead, it is simple to implement and it provides state-of-the-art performance for lexicon reduction on two Arabic databases, namely the Ibn Sina database of subwords and the IFN/ENIT database of words. The last part of this thesis focuses on features for word recognition. A large body of features exist in the literature, each of them being motivated by different fields, such as pattern recognition, computer vision or machine learning. Identifying the most promising approaches would improve the design of the next generation of features. Nevertheless, because they are based on different concepts, it is difficult to compare them on a theoretical ground and efficient empirical tools are needed. Therefore, the last objective of the thesis is to provide a method for feature evaluation that assesses the strength and complementarity of existing features. A combination scheme has been designed for this purpose, in which each feature is evaluated through a reference recognition system, based on recurrent neural networks. More precisely, each feature is represented by an agent, which is an instance of the recognition system trained with that feature. The decisions of all the agents are combined using a weighted vote. The weights are jointly optimized during a training phase in order to increase the weighted vote of the true word label. Therefore, they reflect the strength and complementarity of the agents and their features for the given task. Finally, they are converted into a numerical score assigned to each feature, which is easy to interpret under this combination model. To the best of our knowledge, this is the first feature evaluation method able to quantify the importance of each feature, instead of providing a ranking based on the recognition rate. Five state-of-the-art features have been tested, and our results provide interesting insight for future feature design

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity
    corecore