46 research outputs found
Utilisation de réseaux de confusion pour la reconnaissance de phrases manuscrites en-ligne
National audienceDans cet article, nous nous intéressons à l'intégration d'une représentation des hypothèses de phrases sous forme de réseau de confusion, dans un système de reconnaissance de phrases manuscrites en-ligne. Les probabilités a posteriori des mots, obtenues à partir du réseau de confusion, sont utilisées comme score de confiance afin de détecter d'éventuelles erreurs dans la phrase issue d'un décodage au Maximum A Posteriori sur un graphe de mots. Des classifieurs dédiés (ici, des SVM) sont ensuite appris afin de corriger ces erreurs, en combinant les probabilités a posterio des mots à d'autres sources de connaissance. Une phase de rejet est aussi introduite dans le processus de détection. Des expérimentations menées sur une base de 320 phrases manuscrites montrent une réduction relative du taux d'erreur sur les mots de 31,3%, dans le cas de l'extraction manuelle des mots, et une diminution relative de 60%, lorsque ces mots sont extraits automatiquement
Use of a Confusion Network to Detect and Correct Errors in an On-Line Handwritten Sentence Recognition System
International audienceIn this paper we investigate the integration of a confusion network into an on-line handwritten sentence recognition system. The word posterior probabilities from the confusion network are used as confidence scored to detect potential errors in the output sentence from the Maximum A Posteriori decoding on a word graph. Dedicated classifiers (here, SVMs) are then trained to correct these errors and combine the word posterior probabilities with other sources of knowledge. A rejection phase is also introduced in the detection process. Experiments on handwritten sentences show a 28.5i% relative reduction of the word error rate
Détection et correction d'erreurs utilisant les probabilités a posteriori dans un système de reconnaissance de phrases manuscrites en-ligne
National audienceDans cet article, nous présentons un système complet de reconnaissance de phrases manuscrites en-ligne. Nous nous intéressons plus particulièrement à la détection d'erreurs potentielles sur les phrases issues d'une reconnaissance avec une approche au Maximum A Posteriori. Les probabilités a posteriori des mots, obtenues à partir d'une représentation sous la forme d'un réseau de confusion, sont ainsi utilisées comme indices de confiance. Des classifieurs dédiés (ici, des SVM) sont ensuite appris afin de corriger ces erreurs, en combinant ces probabilités a posteriori à d'autres sources de connaissance. Un mécanisme de rejet est également introduit afin de distinguer les hypothèses d'erreur qui ne pourront être corrigées par l'approche proposée. Des expérimentations ont été menées sur une base de 425 phrases manuscrites écrites par 17 scripteurs. Elles ont mis en évidence une réduction relative du taux d'erreur sur les mots de 14,6
A Priori and A Posteriori Integration and Combination of Language Models in an On-line Handwritten Sentence Recognition System
International audienceThis paper investigates the integration of different language models into an on-line sentence recognition system. The impact of n-gram and n-class (based on statistically and on morpho-syntactically classes) models, built on the Brown corpus, is compared in terms of word recognition rate. Furthermore, their integration in different steps of the recognition process (during it or to rescore the Nbest list of proposed sentences) is considered, thus showing better performances when used the sooner. Combinations of these models are also studied, in addition to the integration in the aforementioned recognition steps. All experiments are carried out on sentences from the Brown corpus which were written by several writers
Statistical Language Models for On-line Handwritten Sentence Recognition
International audienceThis paper investigates the integration of a statistical language model into an on-line recognition system in order to improve word recognition in the context of handwritten sentences. Two kinds of models have been considered: n-gram and n-class models (with a statistical approach to create word classes). All these models are trained over the Susanne corpus and experiments are carried out on sentences from this corpus which were written by several writers. The use of a statistical language model is shown to improve the word recognition rate and the relative impact of the different language models is compared. Furthermore, we illustrate the interest to define an optimal cooperation between the language model and the recognition system to re-enforce the accuracy of the system
Word Extraction Associated with a Confidence Index for On-Line Handwritten Sentence Recognition
International audienceThis paper presents an extension of our on-line sentence recognition system by integrating an automatic word extraction mechanism. Our word extraction task is based on the characterization of inter-stroke gaps, combined to a rejection strategy to evaluate the reliability of the gap classification results. A reconsideration mechanism then used this confidence index to create additional extracted word hypotheses by further controlling the complexity of the recognition task. Different metrics are used to evaluate the impact of this whole word extraction task on the recognition performance, on a set of 395 English sentences
Handling out-of-vocabulary words and recognition errors based on word linguistic context for handwritten sentence recognition
International audienceIn this paper we investigate the use of linguistic information given by language models to deal with word recognition errors on handwritten sentences. We focus especially on errors due to out-of-vocabulary (OOV) words. First, word posterior probabilities are computed and used to detect error hypotheses on output sentences. An SVM classifier allows these errors to be categorized according to defined types. Then, a post-processing step is performed using a language model based on Part-of-Speech (POS) tags which is combined to the n-gram model previously used. Thus, error hypotheses can be further recognized and POS tags can be assigned to the OOV words. Experiments on on-line handwritten sentences show that the proposed approach allows a significant reduction of the word error rate
Fouille de données pour associer des noms de sessions aux articles scientifiques
National audienceIn this paper, we present a proposition based on data mining to tackle the DEFT 2014 challenge. We focus on task 4 which consists of identifying the right conference session for scientific papers. The proposed approach is based on a combination of two data mining techniques. Sequence mining extracts frequent phrases in scientific papers in order to build paper and session descriptions. Then, those descriptions of papers and sessions are used to create a graph which represents shared descriptions. A graph mining technique is applied on the graph in order to extract a collection of homogenous sub-graphs corresponding to sets of papers associated to sessions.Nous décrivons dans cet article notre participation à l'édition 2014 de DEFT. Nous nous intéressons à la tâche consistant à associer des noms de session aux articles d'une conférence. Pour ce faire, nous proposons une approche originale, symbolique et non supervisée, de découverte de connaissances. L'approche combine des méthodes de fouille de données séquentielles et de fouille de graphes. La fouille de séquences permet d'extraire des motifs fréquents dans le but de construire des descriptions des articles et des sessions. Ces descriptions sont ensuite représentées par un graphe. Une technique de fouille de graphes appliquée sur ce graphe permet d'obtenir des collections de sous-graphes homogènes, correspondant à des collections d'articles et de noms de sessions
Design of a framework using InkML for pen-based interaction in a collaborative environment
International audienceWe present a framework based on the standard InkML format to represent digital ink in a collaborative environment using pen-based interaction functionalities. This framework includes the capture, the rendering and the interpretation of the digital ink. In the proposed framework, we focus more particularly on the representation of the contextual environment of the ink and used it for its interpretation (as drawing, for example) as well as on the representation of semantic information attached to the ink after its interpretation
Word Extraction Associated with a Confidence Index for On-Line Handwritten Sentence Recognition
International audienceThis paper presents a word extraction approach based on the use of a confidence index to limit the total number of segmentation hypotheses in order to further extend our on-line sentence recognition system to perform on-the-fly recognition. Our initial word extraction task is based on the characterization of the gap between each couple of consecutive strokes from the on-line signal of the handwritten sentence. A confidence index is associated to the gap classification result in order to evaluate its reliability. A reconsideration process is then performed to create additional segmentation hypotheses to ensure the presence of the correct segmentation among the hypotheses. In this process, we control the total number of segmentation hypotheses to limit the complexity of the recognition process and thus the execution time. This approach is evaluated on a test set of 425 English sentences written by 17 writers, using different metrics to analyze the impact of the word extraction task on the whole sentence recognition system's performances. The word extraction task using the best reconsideration strategy achieves a 97.94% word extraction rate and a 84.85% word recognition rate which represents a 33.1% word error rate decrease relatively to the initial word extraction task (with no segmentation hypothesis reconsideration)