6 research outputs found

    Répertoire des Notaires parisiens Segmentation automatique et reconnaissance d'écriture: Rapport exploratoire

    Get PDF
    Les répertoires des notaires de Paris conservés aux Archives nationales sont parmi les fonds les plus consultéspar le public, mais s’ils sont numérisés et disponibles sur la Salle des Inventaires Virtuelle, pour les exploiter les lecteurs doivent toujours en passer par un dépouillement méthodique car ces répertoires ne sont pas transcrits et on ne peut donc pas y effectuer de recherche en plein texte. Afin de les rendre plus aisément utilisables comme inventaires des minutes des notaires, et d’en permettre des exploitations nouvelles, appliquer les techniques de reconnaissance automatique d’écriture à ce volumineux corpus semble particulièrement opportun. La structure régulière des documents, et une certaine prévisibilité de leurs contenus constituent des atouts, tandis que la multiplicité des écritures rencontrées dans les répertoires est une difficulté qui ne peut pas être ignorée. Une phase d’expérimentation a produit des résultats encourageants quant aux performances de la reconnaissance automatique d’écriture sur ces documents, et offert des pistes quant aux moyens de les améliorer au cours d’un projet plus long et plus ambitieux

    Probabilistic multi-word spotting in handwritten text images

    Full text link
    [EN] Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in our previous works to allow for page-level search of queries consisting in Boolean combinations of several single-keywords. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. An empirical study, also presented in this paper, evaluates the search performance of word-pair queries involving AND and OR Boolean operations. Results of this study support the proposed approach and clearly show its effectiveness. Finally, a web-based demonstration system based on the proposed methods is presented.This work was partially supported by the Generalitat Valenciana under the Prometeo/2009/014 Project Grant ALMAMATER, Spanish MEC under Grant FPU13/06281, and through the EU projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, Grant Ref. 674943).Toselli, AH.; Vidal, E.; Puigcerver, J.; Noya-García, E. (2019). Probabilistic multi-word spotting in handwritten text images. Pattern Analysis and Applications. 22(1):23-32. https://doi.org/10.1007/s10044-018-0742-zS2332221Andreu Sanchez J, Romero V, Toselli A, Vidal E (2014) ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 785–790Bazzi I, Schwartz R, Makhoul J (1999) An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans Pattern Anal Mach Intell 21(6):495–504Bluche T, Hamel S, Kermorvant C, Puigcerver J, Stutzmann D, Toselli AH, Vidal E (2017) Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the hIMANIS Project. In: 14th International conference on document analysis and recognition (ICDAR). (Accepted)Bluche T, Hamel S, Kermorvant C, Puigcerver J, Stutzmann D, Toselli AH, Vidal E (2017) Preparatory kws experiments for large-scale indexing of a vast medieval manuscript collection in the himanis project. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 01, pp 311–316. https://doi.org/10.1109/ICDAR.2017.59Boole G (1854) An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. Macmillan, New YorkCauser T, Wallace V (2012) Building a volunteer community: results and findings from Transcribe Bentham. Digital Humanities Quarterly 6España-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779. https://doi.org/10.1109/TPAMI.2010.141Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 15th International conference on virtual systems and multimedia, 2009. VSMM ’09, pp 137–142. https://doi.org/10.1109/VSMM.2009.26Fréchet M (1935) Généralisations du théorème des probabilités totales. Seminarjum MatematyczneFréchet M (1951) Sur les tableaux de corrélation dont les marges sont données. Ann Univ Lyon 3 ^{\wedge } ∧ e ser Sci Sect A 14:53–77Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855–868Jelinek F (1998) Statistical methods for speech recognition. MIT Press, CambridgeKneser R, Ney H (1995) Improved backing-off for N-gram language modeling. In: International conference on acoustics, speech and signal processing (ICASSP ’95), IEEE Computer Society, Los Alamitos, vol. 1, pp. 181–184, https://doi.org/10.1109/ICASSP.1995.479394Kozielski M, Forster J, Ney H (2012) Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 international conference on frontiers in handwriting recognition, ICFHR ’12, pp 256–261. IEEE Computer Society, Washington. https://doi.org/10.1109/ICFHR.2012.236Lavrenko V, Rath TM, Manmatha R (2004) Holistic word recognition for handwritten historical documents. In: First Proceedings of international workshop on document image analysis for libraries, 2004, pp 278–287. https://doi.org/10.1109/DIAL.2004.1263256Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkMarti UV, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recogn 5:39–46. https://doi.org/10.1007/s100320200071Noya-García E, Toselli AH, Vidal E (2017) Simple and effective multi-word query spotting in handwritten text images, pp 76–84. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-58838-4_9Pratikakis I, Zagoris K, Gatos B, Louloudis G, Stamatopoulos N (2014) ICFHR 2014 competition on handwritten keyword spotting (h-kws 2014). In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 814–819Puigcerver J, Toselli AH, Vidal E (2015) Icdar2015 competition on keyword spotting for handwritten documents. In: 13th international conference on document analysis and recognition (ICDAR), 2015, pp 1176–1180Riba P, Almazn J, Forns A, Fernndez-Mota D, Valveny E, Llads J (2014) e-crowds: a mobile platform for browsing and searching in historical demography-related manuscripts. In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 228–233. https://doi.org/10.1109/ICFHR.2014.46Robertson S (2008) A new interpretation of average precision. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08), pp 689–690. ACM, New York. https://doi.org/10.1145/1390334.1390453Romero V, Toselli AH, Vidal E (2012) Multimodal interactive handwritten text transcription. Series in machine perception and artificial intelligence (MPAI). World Scientific Publishing, SingaporeSánchez JA, Romero V, Toselli AH, Vidal E (2016) ICFHR2016 competition on handwritten text recognition on the READ dataset. In: 15th International conference on frontiers in handwriting recognition (ICFHR’16), pp 630–635. https://doi.org/10.1109/ICFHR.2016.0120Toselli A, Vidal E (2015) Handwritten text recognition results on the Bentham collection with improved classical N-Gram-HMM methods. In: 3rd International workshop on historical document imaging and processing (HIP15), pp 15–22Toselli AH, Juan A, Keysers D, González J, Salvador I, Ney H, Vidal E, Casacuberta F (2004) Integrated Handwriting Recognition and Interpretation using Finite-State Models. Int J Pattern Recogn Artif Intell 18(4):519–539Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Inf Sci 370(C):497–518. https://doi.org/10.1016/j.ins.2016.07.063Vidal E, Toselli AH, Puigcerver J (2015) High performance query-by-example keyword spotting using query-by-string techniques. In: Proceedings of 13th ICDAR, pp 741–745Vidal E, Toselli AH, Puigcerver J (2017) Lexicon-based probabilistic keyword spotting in handwritten text images (to be published)Vinciarelli A, Bengio S, Bunke H (2004) Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720Young S, Evermann G, Gales M, Hain T, Kershaw D (2009) The HTK book: hidden markov models toolkit V3.4. Microsoft Corporation and Cambridge Research Laboratory Ltd, CambridgeYoung S, Odell J, Ollason D, Valtchev V, Woodland P (1997) The HTK book: hidden markov models toolkit V2.1. Cambridge Research Laboratory Ltd, CambridgeZhu M (2004) Recall, precision and average precision. Working paper 2004-09 Department of Statistics and Actuarial Science–University of Waterlo

    Schauplatz Archiv: Objekt - Narrativ - Performanz

    Get PDF
    While archives have traditionally attracted little publicity, this situation is in flux: things that were hidden away - from precious objects to curiosities - are now being made available not only to scholars but to a broader public audience as well. This volume addresses questions related to the accessibility, representation, and dissemination of institutionally preserved cultural heritage
    corecore