14,209 research outputs found

    Using word graphs as intermediate representation of uttered sentences

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-33275-3_35We present an algorithm for building graphs of words as an intermediate representation of uttered sentences. No language model is used. The input data for the algorithm are the pronunciation lexicon organized as a tree and the sequence of acoustic frames. The transition between consecutive units are considered as additional units. Nodes represent discrete instants of time, arcs are labelled with words, and a confidence measure is assigned to each detected word, which is computed by using the phonetic probabilities of the subsequence of acoustic frames used for completing the word. We evaluated the obtained word graphs by searching the path that best matches with the correct sentence and then measuring the word accuracy, i.e. the oracle word accuracy. © 2012 Springer-Verlag.This work was supported by the Spanish MICINN under contract TIN2011-28169-C05-01 and the Vic. d’Investigació of the UPV under contract 20110897.Gómez Adrian, JA.; Sanchís Arnal, E. (2012). Using word graphs as intermediate representation of uttered sentences. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 284-291. doi:10.1007/978-3-642-33275-3_35S284291Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech and Language 11, 43–72 (1997)Ney, H., Ortmanns, S., Lindam, I.: Extensions to the word graph method for large vocabulary continuous speech recognition. In: Proceedings of IEEE ICASSP 1997, Munich, Germany, vol. 3, pp. 1791–1794 (1997)Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence Measures for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 9(3), 288–298 (2001)Ferreiros, J., San-Segundo, R., Fernández, F., D’Haro, L.-F., Sama, V., Barra, R., Mellén, P.: New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. In: Proceedings of INTERSPEECH 2005, Lisbon, Portugal, pp. 3377–3380 (2005)Raymond, C., Béchet, F., De Mori, R., Damnati, G.: On the use of finite state transducers for semantic interpretation. Speech Communication 48, 288–304 (2006)Hakkani-Tür, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech and Language 20, 495–514 (2006)Justo, R., Pérez, A., Torres, M.I.: Impact of the Approaches Involved on Word-Graph Derivation from the ASR System. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) IbPRIA 2011. LNCS, vol. 6669, pp. 668–675. Springer, Heidelberg (2011)Gómez, J.A., Calvo, M.: Improvements on Automatic Speech Segmentation at the Phonetic Level. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 557–564. Springer, Heidelberg (2011)Calvo, M., Gómez, J.A., Sanchis, E., Hurtado, L.F.: An algorithm for automatic speech understanding over word graphs. Procesamiento del Lenguaje Natural (48) (accepted, pending of publication, 2012)Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proceedings of Eurospeech, Berlin, Germany, vol. 1, pp. 653–656 (September 1993)Benedí, J.M., Lleida, E., Varona, A., Castro, M., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proc. of LREC 2006, Genova, Italy (2006

    Word-Graph Based Applications for Handwriting Documents: Impact of Word-Graph Size on Their Performances

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-19390-8 29Computer Assisted Transcription of Text Images (CATTI) and Key-Word Spotting (KWS) applications aim at transcribing and indexing handwritten documents respectively. They both are approached by means of Word Graphs (WG) obtained using segmentation-free handwritten text recognition technology based on N-gram Language Models and Hidden Markov Models. A large WG contains most of the relevant information of the original text (line) image needed for CATTI and KWS but, if it is too large, the computational cost of generating and using it can become unaffordable. Conversely, if it is too small, relevant information may be lost, leading to a reduction of CATTI/KWS in performance accuracy. We study the trade-off between WG size and CATTI &KWS performance in terms of effectiveness and efficiency. Results show that small, computationally cheap WGs can be used without loosing the excellent CATTI/KWS performance achieved with huge WGs.Work partially supported by the Spanish MICINN projects STraDA (TIN2012-37475-C02-01) and by the EU 7th FP tranScriptorium project (Ref:600707).Toselli, AH.; Romero Gómez, V.; Vidal Ruiz, E. (2015). Word-Graph Based Applications for Handwriting Documents: Impact of Word-Graph Size on Their Performances. En Pattern Recognition and Image Analysis. Springer. 253-261. https://doi.org/10.1007/978-3-319-19390-8_29S253261Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription. Series in Machine Perception and Artificial Intelligence (MPAI). World Scientific Publishing, Singapore (2012)Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013)Oerder, M., Ney, H.: Word graphs: an efficient interface between continuous-speech recognition and language understanding. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 119–122, April 1993Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(6), 495–504 (1999)Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Ström, N.: Generation and minimization of word graphs in continuous speech recognition. In: Proceedings of IEEE Workshop on ASR 1995, Snowbird, Utah, pp. 125–126 (1995)Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11(1), 43–72 (1997)Wessel, F., Schluter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, USA (2008)Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, USA (2008)Romero, V., Toselli, A.H., Rodríguez, L., Vidal, E.: Computer assisted transcription for ancient text images. In: Kamel, M.S., Campilho, A. (eds.) ICIAR 2007. LNCS, vol. 4633, pp. 1182–1193. Springer, Heidelberg (2007)Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: 15th International Conference on Virtual Systems and Multimedia, VSMM 2009, pp. 137–142 (2009)Pesch, H., Hamdani, M., Forster, J., Ney, H.: Analysis of preprocessing techniques for latin handwriting recognition. In: ICFHR, pp. 280–284 (2012)Evermann, G.: Minimum Word Error Rate Decoding. Ph.D. thesis, Churchill College, University of Cambridge (1999

    Word graphs size impact on the performance of handwriting document applications

    Full text link
    [EN] Two document processing applications are con- sidered: computer-assisted transcription of text images (CATTI) and Keyword Spotting (KWS), for transcribing and indexing handwritten documents, respectively. Instead of working directly on the handwriting images, both of them employ meta-data structures called word graphs (WG), which are obtained using segmentation-free hand- written text recognition technology based on N-gram lan- guage models and hidden Markov models. A WG contains most of the relevant information of the original text (line) image required by CATTI and KWS but, if it is too large, the computational cost of generating and using it can become unafordable. Conversely, if it is too small, relevant information may be lost, leading to a reduction of CATTI or KWS performance. We study the trade-off between WG size and performance in terms of effectiveness and effi- ciency of CATTI and KWS. Results show that small, computationally cheap WGs can be used without loosing the excellent CATTI and KWS performance achieved with huge WGs.Work partially supported by the Generalitat Valenciana under the Prometeo/2009/014 Project Grant ALMAMATER, by the Spanish MECD as part of the Valorization and I+D+I Resources program of VLC/CAMPUS in the International Excellence Campus program, and through the EU projects: HIMANIS (JPICH programme, Spanish Grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, Grant Ref. 674943).Toselli ., AH.; Romero Gómez, V.; Vidal, E. (2017). Word graphs size impact on the performance of handwriting document applications. Neural Computing and Applications. 28(9):2477-2487. https://doi.org/10.1007/s00521-016-2336-2S24772487289Amengual JC, Vidal E (1998) Efficient error-correcting Viterbi parsing. IEEE Trans Pattern Anal Mach Intell 20(10):1109–1116Bazzi I, Schwartz R, Makhoul J (1999) An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans Pattern Anal Mach Intell 21(6):495–504Erman L, Lesser V (1990) The HEARSAY-II speech understanding system: a tutorial. Readings in Speech Reasoning, pp 235–245Evermann G (1999) Minimum word error rate decoding. Ph.D. thesis, Churchill College, University of CambridgeFischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 15th international conference on virtual systems and multimedia, 2009. VSMM ’09, pp 137–142Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224Furcy D, Koenig S (2005) Limited discrepancy beam search. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI’05, pp 125–131Granell E, Martínez-Hinarejos CD (2015) Multimodal output combination for transcribing historical handwritten documents. In: 16th international conference on computer analysis of images and patterns, CAIP 2015, chap, pp 246–260. Springer International PublishingHakkani-Tr D, Bchet F, Riccardi G, Tur G (2006) Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput Speech Lang 20(4):495–514Jelinek F (1998) Statistical methods for speech recognition. MIT Press, CambridgeJurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, speech recognition, and computational linguistics, 2nd edn. Prentice-Hall, Englewood CliffsKneser R, Ney H (1995) Improved backing-off for N-gram language modeling. In: International conference on acoustics, speech and signal processing (ICASSP ’95), vol 1, pp 181–184. IEEE Computer SocietyLiu P, Soong FK (2006) Word graph based speech recognition error correction by handwriting input. In: Proceedings of the 8th international conference on multimodal interfaces, ICMI ’06, pp 339–346. ACMLowerre BT (1976) The harpy speech recognition system. Ph.D. thesis, Pittsburgh, PALuján-Mares M, Tamarit V, Alabau V, Martínez-Hinarejos CD, Pastor M, Sanchis A, Toselli A (2008) iATROS: a speech and handwritting recognition system. In: V Jornadas en Tecnologías del Habla (VJTH’2008), pp 75–78Mangu L, Brill E, Stolcke A (2000) Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput Speech Lang 14(4):373–400Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkMohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88Odell JJ, Valtchev V, Woodland PC, Young SJ (1994) A one pass decoder design for large vocabulary recognition. In: Proceedings of the workshop on human language technology, HLT ’94, pp 405–410. Association for Computational LinguisticsOerder M, Ney H (1993) Word graphs: an efficient interface between continuous-speech recognition and language understanding. IEEE Int Conf Acoust Speech Signal Process 2:119–122Olivie J, Christianson C, McCarry J (eds) (2011) Handbook of natural language processing and machine translation. Springer, BerlinOrtmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72Padmanabhan M, Saon G, Zweig G (2000) Lattice-based unsupervised MLLR for speaker adaptation. In: ASR2000-automatic speech recognition: challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW)Pesch H, Hamdani M, Forster J, Ney H (2012) Analysis of preprocessing techniques for latin handwriting recognition. In: International conference on frontiers in handwriting recognition, ICFHR’12, pp 280–284Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing SocietyPovey D, Hannemann M, Boulianne G, Burget L, Ghoshal A, Janda M, Karafiat M, Kombrink S, Motlcek P, Qian Y, Riedhammer K, Vesely K, Vu NT (2012) Generating Exact Lattices in the WFST Framework. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP)Rabiner L (1989) A tutorial of hidden Markov models and selected application in speech recognition. Proc IEEE 77:257–286Robertson S (2008) A new interpretation of average precision. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08), pp 689–690. ACMRomero V, Toselli AH, Rodríguez L, Vidal E (2007) Computer assisted transcription for ancient text images. Proc Int Conf Image Anal Recogn LNCS 4633:1182–1193Romero V, Toselli AH, Vidal E (2012) Multimodal interactive handwritten text transcription. Series in machine perception and artificial intelligence (MPAI). World Scientific Publishing, SingaporeRybach D, Gollan C, Heigold G, Hoffmeister B, Lööf J, Schlüter R, Ney H (2009) The RWTH aachen university open source speech recognition system. In: Interspeech, pp 2111–2114Sánchez J, Mühlberger G, Gatos B, Schofield P, Depuydt K, Davis R, Vidal E, de Does J (2013) tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp 227–228Saon G, Povey D, Zweig G (2005) Anatomy of an extremely fast LVCSR decoder. In: INTERSPEECH, pp 549–552Strom N (1995) Generation and minimization of word graphs in continuous speech recognition. In: Proceedings of IEEE workshop on ASR’95, pp 125–126. Snowbird, UtahTanha J, de Does J, Depuydt K (2015) Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition. In: ESANN 2015 proceedings, European symposium on artificial neural networks, computational intelligence and machine learning, pp 361–366Toselli A, Romero V, i Gadea MP, Vidal E (2010) Multimodal interactive transcription of text images. Pattern Recogn 43(5):1814–1825Toselli A, Romero V, Vidal E (2015) Word-graph based applications for handwriting documents: impact of word-graph size on their performances. In: Paredes R, Cardoso JS, Pardo XM (eds) Pattern recognition and image analysis. Lecture Notes in Computer Science, vol 9117, pp 253–261. Springer International PublishingToselli AH, Juan A, Keysers D, Gonzlez J, Salvador I, Ney H, Vidal E, Casacuberta F (2004) Integrated handwriting recognition and interpretation using finite-state models. Int J Pattern Recogn Artif Intell 18(4):519–539Toselli AH, Vidal E (2013) Fast HMM-Filler approach for key word spotting in handwritten documents. In: Proceedings of the 12th international conference on document analysis and recognition (ICDAR’13). IEEE Computer SocietyToselli AH, Vidal E, Romero V, Frinken V (2013) Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de ValènciaUeffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Linguist 33(1):9–40. doi: 10.1162/coli.2007.33.1.9Vinciarelli A, Bengio S, Bunke H (2004) Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720Weng F, Stolcke A, Sankar A (1998) Efficient lattice representation and generation. In: Proceedings of ICSLP, pp 2531–2534Wessel F, Schluter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9(3):288–298Wolf J, Woods W (1977) The HWIM speech understanding system. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP ’77, vol 2, pp 784–787Woodland P, Leggetter C, Odell J, Valtchev V, Young S (1995) The 1994 HTK large vocabulary speech recognition system. In: International conference on acoustics, speech, and signal processing (ICASSP ’95), vol 1, pp 73 –76Young S, Odell J, Ollason D, Valtchev V, Woodland P (1997) The HTK book: hidden Markov models toolkit V2.1. Cambridge Research Laboratory Ltd, CambridgeYoung S, Russell N, Thornton J (1989) Token passing: a simple conceptual model for connected speech recognition systems. Technical reportZhu M (2004) Recall, precision and average precision. Working Paper 2004–09 Department of Statistics and Actuarial Science, University of WaterlooZimmermann M, Bunke H (2004) Optimizing the integration of a statistical language model in hmm based offline handwritten text recognition. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 2, pp 541–54

    New Method for Optimization of License Plate Recognition system with Use of Edge Detection and Connected Component

    Full text link
    License Plate recognition plays an important role on the traffic monitoring and parking management systems. In this paper, a fast and real time method has been proposed which has an appropriate application to find tilt and poor quality plates. In the proposed method, at the beginning, the image is converted into binary mode using adaptive threshold. Then, by using some edge detection and morphology operations, plate number location has been specified. Finally, if the plat has tilt, its tilt is removed away. This method has been tested on another paper data set that has different images of the background, considering distance, and angel of view so that the correct extraction rate of plate reached at 98.66%.Comment: 3rd IEEE International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi Universit Mashha

    Optimizing expected word error rate via sampling for speech recognition

    Full text link
    State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models. It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using cross-entropy (CE) or connectionist temporal classification (CTC). sMBR training optimizes the expected number of frames at which the reference and hypothesized acoustic states differ. It may be preferable to optimize the expected WER, but WER does not interact well with the expectation semiring, and previous approaches based on computing expected WER exactly involve expanding the lattices used during training. In this paper we show how to perform optimization of the expected WER by sampling paths from the lattices used during conventional sMBR training. The gradient of the expected WER is itself an expectation, and so may be approximated using Monte Carlo sampling. We show experimentally that optimizing WER during acoustic model training gives 5% relative improvement in WER over a well-tuned sMBR baseline on a 2-channel query recognition task (Google Home)
    • …
    corecore