Search CORE

4,634 research outputs found

CHARACTER-LEVEL INTERACTIONS IN MULTIMODAL COMPUTER-ASSISTED TRANSCRIPTION OF TEXT IMAGES

Author: Martín-Albo Simón Daniel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 26/07/2011
Field of study

HTR systems don't achieve acceptable results in unconstrained applications. Therefore, it is convenient to use a system that allows the user to cooperate in the most confortable way with the system to generate a correct transcription. In this paper, multimodal interaction at character-level is studied.Martín-Albo Simón, D. (2011). CHARACTER-LEVEL INTERACTIONS IN MULTIMODAL COMPUTER-ASSISTED TRANSCRIPTION OF TEXT IMAGES. http://hdl.handle.net/10251/11313Archivo delegad

RiuNet

Character-level interaction in multimodal computer-assisted transcription of text images

Author: A. Diplaros
J. Kittler
P. Faber
P.F. Felzenszwalb
R. Nock
R.A. Hummel
R.M. Haralick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

“The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21257-4_85To date, automatic handwriting text recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. As an alternative, an interactive framework that integrates the human knowledge into the transcription process has been presented in previous works. In this work, multimodal interaction at character-level is studied. Until now, multimodal interaction had been studied only at whole-word level. However, character-level pen-stroke interactions may lead to more ergonomic and friendly interfaces. Empirical tests show that this approach can save significant amounts of user effort with respect to both fully manual transcription and non-interactive post-editing correction.Work supported by the Spanish Government (MICINN and “Plan E”) under the MITTRAL (TIN2009-14633-C03-01) research project and under the research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018), and by the Generalitat Valenciana under grant Prometeo/2009/014.Martín-Albo Simón, D.; Romero Gómez, V.; Toselli ., AH.; Vidal, E. (2011). Character-level interaction in multimodal computer-assisted transcription of text images. En Pattern Recognition and Image Analysis. Springer Verlag (Germany). 684-691. https://doi.org/10.1007/978-3-642-21257-4S68469

Crossref

Repositori Institucional de la Universitat Jaume I

RiuNet

Publikationsserver der RWTH Aachen University

Joint Institute for Nuclear Research (JINR)

MPG.PuRe

Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition

Author: Ocampo Sepúlveda Jorge Carlos
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 11/01/2012
Field of study

A human-computer interface is developed to provide services of computer assisted machine translation (CAT) and computer assisted transcription of handwritten text images (CATTI). The back-end machine translation (MT) and handwritten text recognition (HTR) systems are provided by the Pattern Recognition and Human Language Technology (PRHLT) research group. The idea is to provide users with easy to use tools to convert interactive translation and transcription feasible tasks. The assisted service is provided by remote servers with CAT or CATTI capabilities. The interface supplies the user with tools for efficient local edition: deletion, insertion and substitution.Ocampo Sepúlveda, JC. (2009). Implementation of a Human-Computer Interface for Computer Assisted Translation and Handwritten Text Recognition. http://hdl.handle.net/10251/14318Archivo delegad

RiuNet

Multimodal Interactive Transcription of Handwritten Text Images

Author: Romero Gómez Verónica
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 20/09/2010
Field of study

En esta tesis se presenta un nuevo marco interactivo y multimodal para la transcripción de Documentos manuscritos. Esta aproximación, lejos de proporcionar la transcripción completa pretende asistir al experto en la dura tarea de transcribir. Hasta la fecha, los sistemas de reconocimiento de texto manuscrito disponibles no proporcionan transcripciones aceptables por los usuarios y, generalmente, se requiere la intervención del humano para corregir las transcripciones obtenidas. Estos sistemas han demostrado ser realmente útiles en aplicaciones restringidas y con vocabularios limitados (como es el caso del reconocimiento de direcciones postales o de cantidades numéricas en cheques bancarios), consiguiendo en este tipo de tareas resultados aceptables. Sin embargo, cuando se trabaja con documentos manuscritos sin ningún tipo de restricción (como documentos manuscritos antiguos o texto espontáneo), la tecnología actual solo consigue resultados inaceptables. El escenario interactivo estudiado en esta tesis permite una solución más efectiva. En este escenario, el sistema de reconocimiento y el usuario cooperan para generar la transcripción final de la imagen de texto. El sistema utiliza la imagen de texto y una parte de la transcripción previamente validada (prefijo) para proponer una posible continuación. Despues, el usuario encuentra y corrige el siguente error producido por el sistema, generando así un nuevo prefijo mas largo. Este nuevo prefijo, es utilizado por el sistema para sugerir una nueva hipótesis. La tecnología utilizada se basa en modelos ocultos de Markov y n-gramas. Estos modelos son utilizados aquí de la misma manera que en el reconocimiento automático del habla. Algunas modificaciones en la definición convencional de los n-gramas han sido necesarias para tener en cuenta la retroalimentación del usuario en este sistema.Romero Gómez, V. (2010). Multimodal Interactive Transcription of Handwritten Text Images [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8541Palanci

Crossref

RiuNet

Image speech combination for interactive computer assisted transcription of handwritten documents

Author: Granell Emilio
Martínez-Hinarejos Carlos-D.
Romero Verónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

[EN] Handwritten document transcription aims to obtain the contents of a document to provide efficient information access to, among other, digitised historical documents. The increasing number of historical documents published by libraries and archives makes this an important task. In this context, the use of image processing and understanding techniques in conjunction with assistive technologies reduces the time and human effort required for obtaining the final perfect transcription. The assistive transcription system proposes a hypothesis, usually derived from a recognition process of the handwritten text image. Then, the professional transcriber feedback can be used to obtain an improved hypothesis and speed-up the final transcription. In this framework, a speech signal corresponding to the dictation of the handwritten text can be used as an additional source of information. This multimodal approach, that combines the image of the handwritten text with the speech of the dictation of its contents, could make better the hypotheses (initial and improved) offered to the transcriber. In this paper we study the feasibility of a multimodal interactive transcription system for an assistive paradigm known as Computer Assisted Transcription of Text Images. Different techniques are tested for obtaining the multimodal combination in this framework. The use of the proposed multimodal approach reveals a significant reduction of transcription effort with some multimodal combination techniques, allowing for a faster transcription process.Work partially supported by projects READ-674943 (European Union's H2020), SmartWays-RTC-2014-1466-4 (MINECO, Spain), and CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER), and by Generalitat Valenciana (GVA), Spain under reference PROMETEOII/2014/030.Granell, E.; Romero, V.; Martínez-Hinarejos, C. (2019). Image speech combination for interactive computer assisted transcription of handwritten documents. Computer Vision and Image Understanding. 180:74-83. https://doi.org/10.1016/j.cviu.2019.01.009S748318

RiuNet

Escritoire: A Multi-touch Desk with e-Pen Input for Capture, Management and Multimodal Interactive Transcription of Handwritten Documents

Author: Martín-Albo Simón Daniel
Romero Gómez Verónica
Vidal Ruiz Enrique
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-19390-8_53A large quantity of documents used every day are still handwritten. However, it is interesting to transform each of these documents into its digital version for managing, archiving and sharing. Here we present Escritoire, a multi-touch desk that allows the user to capture, transcribe and work with handwritten documents. The desktop is continuously monitored using two cameras. Whenever the user makes a specific hand gesture over a paper, Escritoire proceeds to take an image. Then, the capture is automatically preprocesses, obtaining as a result an improved representation. Finally, the text image is transcribed using automatic techniques and finally the transcription is displayed on Escritoire.This work was partially supported by the Spanish MEC under FPU scholarship (AP2010-0575), STraDA research project (TIN2012-37475-C02-01) and MITTRAL research project (TIN2009-14633-C03-01); the EU’s 7th Framework Programme under tranScriptorium grant agreement (FP7/2007-2013/600707).Martín-Albo Simón, D.; Romero Gómez, V.; Vidal Ruiz, E. (2015). Escritoire: A Multi-touch Desk with e-Pen Input for Capture, Management and Multimodal Interactive Transcription of Handwritten Documents. En Pattern Recognition and Image Analysis. Springer. 471-478. https://doi.org/10.1007/978-3-319-19390-8_53S471478Andrew, A.: Another efficient algorithm for convex hulls in two dimensions. Inf. Process. Lett. 9(5), 216–219 (1979)Bosch, V., Toselli, A.H., Vidal, E.: Statistical text line analysis in handwritten documents. In: Proceedings of ICFHR (2012)Eisenstein, J., Puerta, A.: Adaptation in automated user-interface design. In: Proceedings of International Conference on Intelligent User Interfaces (2000)Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Series D), 35–45 (1960)Keysers, D., Shafait, F., Breuel, T.M.: Document image zone classification - a simple high-performance approach. In: Proceedings of International Conference on Computer Vision Theory (2007)Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of ICFHR (2012)Lampert, C.H., Braun, T., Ulges, A., Keysers, D., Breuel, T.M.: Oblivious document capture and real-time retrieval. In: International Workshop on Camera Based Document Analysis and Recognition (2005)Liang, J., Doermann, D., Li, H.: Camera based analysis of text and documents a survey. Int. J. Doc. Anal. Recogn. 7(2–3), 84–104 (2005)Liwicki, M., Rostanin, O., El-Neklawy, S.M., Dengel, A.: Touch & write: a multi-touch table with pen-input. In: Proceedings of International Workshop on Document Analysis Systems (2010)Marti, U.V., Bunke, H.: Text line segmentation and word recognition in a system for general writer independent handwriting recognition. In: Proceedings of ICDAR (2001)Martín-Albo, D., Romero, V., Toselli, A.H., Vidal, E.: Multimodal computer-assisted transcription of text images at character-level interaction. Int. J. Pattern Recogn. Artif. Intell. 26(5), 19 (2012)Martín-Albo, D., Romero, V., Vidal, E.: Interactive off-line handwritten text transcription using on-line handwritten text as feedback. In: Proceedings of ICDAR (2013)Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. B Cybern. 37(3), 311–324 (2007)Terry, M., Mynatt, E.D.: Recognizing creative needs in user interface design. In: Proceedings of C&C (2002)Toselli, A.H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recognit. Artif. Intell. 18(4), 519–539 (2004)Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal interactive transcription of text images. Pattern Recognit. 43(5), 1814–1825 (2010)Toselli, A.H., Romero, V., Vidal, E.: Computer assisted transcription of text images and multimodal interaction. In: Popescu-Belis, A., Stiefelhagen, R. (eds.) MLMI 2008. LNCS, vol. 5237, pp. 296–308. Springer, Heidelberg (2008)Wachs, J.P., Kolsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. Commun. ACM. 54(2), 60–71 (2011)Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-defined gestures for surface computing. In: Proceedings of CHI (2009

RiuNet

Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing

Author: Granell Romero Emilio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/09/2017
Field of study

Natural Language Processing (NLP) is an interdisciplinary research field of Computer Science, Linguistics, and Pattern Recognition that studies, among others, the use of human natural languages in Human-Computer Interaction (HCI). Most of NLP research tasks can be applied for solving real-world problems. This is the case of natural language recognition and natural language translation, that can be used for building automatic systems for document transcription and document translation. Regarding digitalised handwritten text documents, transcription is used to obtain an easy digital access to the contents, since simple image digitalisation only provides, in most cases, search by image and not by linguistic contents (keywords, expressions, syntactic or semantic categories). Transcription is even more important in historical manuscripts, since most of these documents are unique and the preservation of their contents is crucial for cultural and historical reasons. The transcription of historical manuscripts is usually done by paleographers, who are experts on ancient script and vocabulary. Recently, Handwritten Text Recognition (HTR) has become a common tool for assisting paleographers in their task, by providing a draft transcription that they may amend with more or less sophisticated methods. This draft transcription is useful when it presents an error rate low enough to make the amending process more comfortable than a complete transcription from scratch. Thus, obtaining a draft transcription with an acceptable low error rate is crucial to have this NLP technology incorporated into the transcription process. The work described in this thesis is focused on the improvement of the draft transcription offered by an HTR system, with the aim of reducing the effort made by paleographers for obtaining the actual transcription on digitalised historical manuscripts. This problem is faced from three different, but complementary, scenarios: · Multimodality: The use of HTR systems allow paleographers to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is to obtain the draft transcription by dictating the contents to an Automatic Speech Recognition (ASR) system. When both sources (image and speech) are available, a multimodal combination is possible and an iterative process can be used in order to refine the final hypothesis. · Interactivity: The use of assistive technologies in the transcription process allows one to reduce the time and human effort required for obtaining the actual transcription, given that the assistive system and the palaeographer cooperate to generate a perfect transcription. Multimodal feedback can be used to provide the assistive system with additional sources of information by using signals that represent the whole same sequence of words to transcribe (e.g. a text image, and the speech of the dictation of the contents of this text image), or that represent just a word or character to correct (e.g. an on-line handwritten word). · Crowdsourcing: Open distributed collaboration emerges as a powerful tool for massive transcription at a relatively low cost, since the paleographer supervision effort may be dramatically reduced. Multimodal combination allows one to use the speech dictation of handwritten text lines in a multimodal crowdsourcing platform, where collaborators may provide their speech by using their own mobile device instead of using desktop or laptop computers, which makes it possible to recruit more collaborators.El Procesamiento del Lenguaje Natural (PLN) es un campo de investigación interdisciplinar de las Ciencias de la Computación, Lingüística y Reconocimiento de Patrones que estudia, entre otros, el uso del lenguaje natural humano en la interacción Hombre-Máquina. La mayoría de las tareas de investigación del PLN se pueden aplicar para resolver problemas del mundo real. Este es el caso del reconocimiento y la traducción del lenguaje natural, que se pueden utilizar para construir sistemas automáticos para la transcripción y traducción de documentos. En cuanto a los documentos manuscritos digitalizados, la transcripción se utiliza para facilitar el acceso digital a los contenidos, ya que la simple digitalización de imágenes sólo proporciona, en la mayoría de los casos, la búsqueda por imagen y no por contenidos lingüísticos. La transcripción es aún más importante en el caso de los manuscritos históricos, ya que la mayoría de estos documentos son únicos y la preservación de su contenido es crucial por razones culturales e históricas. La transcripción de manuscritos históricos suele ser realizada por paleógrafos, que son personas expertas en escritura y vocabulario antiguos. Recientemente, los sistemas de Reconocimiento de Escritura (RES) se han convertido en una herramienta común para ayudar a los paleógrafos en su tarea, la cual proporciona un borrador de la transcripción que los paleógrafos pueden corregir con métodos más o menos sofisticados. Este borrador de transcripción es útil cuando presenta una tasa de error suficientemente reducida para que el proceso de corrección sea más cómodo que una completa transcripción desde cero. Por lo tanto, la obtención de un borrador de transcripción con una baja tasa de error es crucial para que esta tecnología de PLN sea incorporada en el proceso de transcripción. El trabajo descrito en esta tesis se centra en la mejora del borrador de transcripción ofrecido por un sistema RES, con el objetivo de reducir el esfuerzo realizado por los paleógrafos para obtener la transcripción de manuscritos históricos digitalizados. Este problema se enfrenta a partir de tres escenarios diferentes, pero complementarios: · Multimodalidad: El uso de sistemas RES permite a los paleógrafos acelerar el proceso de transcripción manual, ya que son capaces de corregir en un borrador de la transcripción. Otra alternativa es obtener el borrador de la transcripción dictando el contenido a un sistema de Reconocimiento Automático de Habla. Cuando ambas fuentes están disponibles, una combinación multimodal de las mismas es posible y se puede realizar un proceso iterativo para refinar la hipótesis final. · Interactividad: El uso de tecnologías asistenciales en el proceso de transcripción permite reducir el tiempo y el esfuerzo humano requeridos para obtener la transcripción correcta, gracias a la cooperación entre el sistema asistencial y el paleógrafo para obtener la transcripción perfecta. La realimentación multimodal se puede utilizar en el sistema asistencial para proporcionar otras fuentes de información adicionales con señales que representen la misma secuencia de palabras a transcribir (por ejemplo, una imagen de texto, o la señal de habla del dictado del contenido de dicha imagen de texto), o señales que representen sólo una palabra o carácter a corregir (por ejemplo, una palabra manuscrita mediante una pantalla táctil). · Crowdsourcing: La colaboración distribuida y abierta surge como una poderosa herramienta para la transcripción masiva a un costo relativamente bajo, ya que el esfuerzo de supervisión de los paleógrafos puede ser drásticamente reducido. La combinación multimodal permite utilizar el dictado del contenido de líneas de texto manuscrito en una plataforma de crowdsourcing multimodal, donde los colaboradores pueden proporcionar las muestras de habla utilizando su propio dispositivo móvil en lugar de usar ordenadores,El Processament del Llenguatge Natural (PLN) és un camp de recerca interdisciplinar de les Ciències de la Computació, la Lingüística i el Reconeixement de Patrons que estudia, entre d'altres, l'ús del llenguatge natural humà en la interacció Home-Màquina. La majoria de les tasques de recerca del PLN es poden aplicar per resoldre problemes del món real. Aquest és el cas del reconeixement i la traducció del llenguatge natural, que es poden utilitzar per construir sistemes automàtics per a la transcripció i traducció de documents. Quant als documents manuscrits digitalitzats, la transcripció s'utilitza per facilitar l'accés digital als continguts, ja que la simple digitalització d'imatges només proporciona, en la majoria dels casos, la cerca per imatge i no per continguts lingüístics (paraules clau, expressions, categories sintàctiques o semàntiques). La transcripció és encara més important en el cas dels manuscrits històrics, ja que la majoria d'aquests documents són únics i la preservació del seu contingut és crucial per raons culturals i històriques. La transcripció de manuscrits històrics sol ser realitzada per paleògrafs, els quals són persones expertes en escriptura i vocabulari antics. Recentment, els sistemes de Reconeixement d'Escriptura (RES) s'han convertit en una eina comuna per ajudar els paleògrafs en la seua tasca, la qual proporciona un esborrany de la transcripció que els paleògrafs poden esmenar amb mètodes més o menys sofisticats. Aquest esborrany de transcripció és útil quan presenta una taxa d'error prou reduïda perquè el procés de correcció siga més còmode que una completa transcripció des de zero. Per tant, l'obtenció d'un esborrany de transcripció amb un baixa taxa d'error és crucial perquè aquesta tecnologia del PLN siga incorporada en el procés de transcripció. El treball descrit en aquesta tesi se centra en la millora de l'esborrany de la transcripció ofert per un sistema RES, amb l'objectiu de reduir l'esforç realitzat pels paleògrafs per obtenir la transcripció de manuscrits històrics digitalitzats. Aquest problema s'enfronta a partir de tres escenaris diferents, però complementaris: · Multimodalitat: L'ús de sistemes RES permet als paleògrafs accelerar el procés de transcripció manual, ja que són capaços de corregir un esborrany de la transcripció. Una altra alternativa és obtenir l'esborrany de la transcripció dictant el contingut a un sistema de Reconeixement Automàtic de la Parla. Quan les dues fonts (imatge i parla) estan disponibles, una combinació multimodal és possible i es pot realitzar un procés iteratiu per refinar la hipòtesi final. · Interactivitat: L'ús de tecnologies assistencials en el procés de transcripció permet reduir el temps i l'esforç humà requerits per obtenir la transcripció real, gràcies a la cooperació entre el sistema assistencial i el paleògraf per obtenir la transcripció perfecta. La realimentació multimodal es pot utilitzar en el sistema assistencial per proporcionar fonts d'informació addicionals amb senyals que representen la mateixa seqüencia de paraules a transcriure (per exemple, una imatge de text, o el senyal de parla del dictat del contingut d'aquesta imatge de text), o senyals que representen només una paraula o caràcter a corregir (per exemple, una paraula manuscrita mitjançant una pantalla tàctil). · Crowdsourcing: La col·laboració distribuïda i oberta sorgeix com una poderosa eina per a la transcripció massiva a un cost relativament baix, ja que l'esforç de supervisió dels paleògrafs pot ser reduït dràsticament. La combinació multimodal permet utilitzar el dictat del contingut de línies de text manuscrit en una plataforma de crowdsourcing multimodal, on els col·laboradors poden proporcionar les mostres de parla utilitzant el seu propi dispositiu mòbil en lloc d'utilitzar ordinadors d'escriptori o portàtils, la qual cosa permet ampliar el nombrGranell Romero, E. (2017). Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86137TESI

Crossref

RiuNet

Multimodality, interactivity, and crowdsourcing for document transcription

Author: Granell Emilio
Martínez-Hinarejos Carlos-D.
Romero Verónica
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

This is the peer reviewed version of the following article: Granell, Emilio, Romero, Verónica, Martínez-Hinarejos, Carlos-D.. (2018). Multimodality, interactivity, and crowdsourcing for document transcription.Computational Intelligence, 34, 2, 398-419. DOI: 10.1111/coin.12169, which has been published in final form at http://doi.org/10.1111/coin.12169.. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.[EN] Knowledge mining from documents usually use document engineering techniques that allow the user to access the information contained in documents of interest. In this framework, transcription may provide efficient access to the contents of handwritten documents. Manual transcription is a time-consuming task that can be sped up by using different mechanisms. A first possibility is employing state-of-the-art handwritten text recognition systems to obtain an initial draft transcription that can be manually amended. A second option is employing crowdsourcing to obtain a massive but not error-free draft transcription. In this case, when collaborators employ mobile devices, speech dictation can be used as a transcription source, and speech and handwritten text recognition can be fused to provide a better draft transcription, which can be amended with even less effort. A final option is using interactive assistive frameworks, where the automatic system that provides the draft transcription and the transcriber cooperate to generate the final transcription. The novel contributions presented in this work include the study of the data fusion on a multimodal crowdsourcing framework and its integration with an interactive system. The use of the proposed solutions reduces the required transcription effort and optimizes the overall performance and usability, allowing for a better transcription process.projects READ, Grant/Award Number: 674943; (European Union's H2020); Smart Ways, Grant/Award Number: RTC-2014-1466-4; (MINECO); CoMUN-HaT, Grant/Award Number: TIN2015-70924-C2-1-R; (MINECO / FEDER)Granell, E.; Romero, V.; Martínez-Hinarejos, C. (2018). Multimodality, interactivity, and crowdsourcing for document transcription. Computational Intelligence. 34(2):398-419. https://doi.org/10.1111/coin.12169S39841934

RiuNet

Contex-aware gestures for mixed-initiative text editings UIs

Author: Alabau Vicent
Leiva Luis A.
Romero Gómez Verónica
Toselli Alejandro Héctor
Vidal Enrique
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

This is a pre-copyedited, author-produced PDF of an article accepted for publication in Interacting with computers following peer review. The version of record is available online at: http://dx.doi.org/10.1093/iwc/iwu019[EN] This work is focused on enhancing highly interactive text-editing applications with gestures. Concretely, we study Computer Assisted Transcription of Text Images (CATTI), a handwriting transcription system that follows a corrective feedback paradigm, where both the user and the system collaborate efficiently to produce a high-quality text transcription. CATTI-like applications demand fast and accurate gesture recognition, for which we observed that current gesture recognizers are not adequate enough. In response to this need we developed MinGestures, a parametric context-aware gesture recognizer. Our contributions include a number of stroke features for disambiguating copy-mark gestures from handwritten text, plus the integration of these gestures in a CATTI application. It becomes finally possible to create highly interactive stroke-based text-editing interfaces, without worrying to verify the user intent on-screen. We performed a formal evaluation with 22 e-pen users and 32 mouse users using a gesture vocabulary of 10 symbols. MinGestures achieved an outstanding accuracy (<1% error rate) with very high performance (<1 ms of recognition time). We then integrated MinGestures in a CATTI prototype and tested the performance of the interactive handwriting system when it is driven by gestures. Our results show that using gestures in interactive handwriting applications is both advantageous and convenient when gestures are simple but context-aware. Taken together, this work suggests that text-editing interfaces not only can be easily augmented with simple gestures, but also may substantially improve user productivity.This work has been supported by the European Commission through the 7th Framework Program (tranScriptorium: FP7- ICT-2011-9, project 600707 and CasMaCat: FP7-ICT-2011-7, project 287576). It has also been supported by the Spanish MINECO under grant TIN2012-37475-C02-01 (STraDa), and the Generalitat Valenciana under grant ISIC/2012/004 (AMIIS).Leiva, LA.; Alabau, V.; Romero Gómez, V.; Toselli, AH.; Vidal, E. (2015). Contex-aware gestures for mixed-initiative text editings UIs. Interacting with Computers. 27(6):675-696. https://doi.org/10.1093/iwc/iwu019S675696276Alabau V. Leiva L. A. Transcribing Handwritten Text Images with a Word Soup Game. Proc. Extended Abstr. Hum. Factors Comput. Syst. (CHI EA) 2012.Alabau V. Rodríguez-Ruiz L. Sanchis A. Martínez-Gómez P. Casacuberta F. On Multimodal Interactive Machine Translation Using Speech Recognition. Proc. Int. Conf. Multimodal Interfaces (ICMI). 2011a.Alabau V. Sanchis A. Casacuberta F. Improving On-Line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation. Proc. Assoc. Comput. Linguistics (ACL) 2011b.Alabau, V., Sanchis, A., & Casacuberta, F. (2014). Improving on-line handwritten recognition in interactive machine translation. Pattern Recognition, 47(3), 1217-1228. doi:10.1016/j.patcog.2013.09.035Anthony L. Wobbrock J. O. A Lightweight Multistroke Recognizer for User Interface Prototypes. Proc. Conf. Graph. Interface (GI). 2010.Anthony L. Wobbrock J. O. N-Protractor: a Fast and Accurate Multistroke Recognizer. Proc. Conf. Graph. Interface (GI) 2012.Anthony L. Vatavu R.-D. Wobbrock J. O. Understanding the Consistency of Users' Pen and Finger Stroke Gesture Articulation. Proc. Conf. Graph. Interface (GI). 2013.Appert C. Zhai S. Using Strokes as Command Shortcuts: Cognitive Benefits and Toolkit Support. Proc. SIGCHI Conf. Hum. Fact. Comput. Syst. (CHI) 2009.Bahlmann C. Haasdonk B. Burkhardt H. On-Line Handwriting Recognition with Support Vector Machines: A Kernel Approach. Proc. Int. Workshop Frontiers Handwriting Recognition (IWFHR). 2001.Bailly G. Lecolinet E. Nigay L. Flower Menus: a New Type of Marking Menu with Large Menu Breadth, within Groups and Efficient Expert Mode Memorization. Proc.Work. Conf. Adv. Vis. Interfaces (AVI) 2008.Balakrishnan R. Patel P. The PadMouse: Facilitating Selection and Spatial Positioning for the Non-Dominant Hand. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 1998.Bau O. Mackay W. E. Octopocus: A Dynamic Guide for Learning Gesture-Based Command Sets. Proc. ACM Symp. User Interface Softw. Technol. (UIST) 2008.Belaid A. Haton J. A syntactic approach for handwritten formula recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1984;6:105-111.Bosch V. Bordes-Cabrera I. Munoz P. C. Hernández-Tornero C. Leiva L. A. Pastor M. Romero V. Toselli A. H. Vidal E. Transcribing a XVII Century Handwritten Botanical Specimen Book from Scratch. Proc. Int. Conf. Digital Access Textual Cultural Heritage (DATeCH). 2014.Buxton W. The natural language of interaction: a perspective on non-verbal dialogues. INFOR 1988;26:428-438.Cao X. Zhai S. Modeling Human Performance of Pen Stroke Gestures. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 2007.Castro-Bleda M. J. España-Boquera S. Llorens D. Marzal A. Prat F. Vilar J. M. Zamora-Martinez F. Speech Interaction in a Multimodal Tool for Handwritten Text Transcription. Proc. Int. Conf. Multimodal Interfaces (ICMI) 2011.Connell S. D. Jain A. K. Template-based on-line character recognition. Pattern Recognition 2000;34:1-14.Costagliola G. Deufemia V. Polese G. Risi M. A Parsing Technique for Sketch Recognition Systems. Proc. 2004 IEEE Symp. Vis. Lang. Hum. Centric Comput. (VLHCC). 2004.Culotta, A., Kristjansson, T., McCallum, A., & Viola, P. (2006). Corrective feedback and persistent learning for information extraction. Artificial Intelligence, 170(14-15), 1101-1122. doi:10.1016/j.artint.2006.08.001Deepu V. Madhvanath S. Ramakrishnan A. Principal Component Analysis for Online Handwritten Character Recognition. Proc. Int. Conf. Pattern Recognition (ICPR). 2004.Delaye A. Sekkal R. Anquetil E. Continuous Marking Menus for Learning Cursive Pen-Based Gestures. Proc. Int. Conf. Intell. User Interfaces (IUI) 2011.Dimitriadis Y. Coronado J. Towards an art-based mathematical editor that uses on-line handwritten symbol recognition. Pattern Recognition 1995;8:807-822.El Meseery M. El Din M. F. Mashali S. Fayek M. Darwish N. Sketch Recognition Using Particle Swarm Algorithms. Proc. 16th IEEE Int. Conf. Image Process. (ICIP). 2009.Goldberg D. Goodisman A. Stylus User Interfaces for Manipulating Text. Proc. ACM Symp. User Interface Softw. Technol. (UIST) 1991.Goldberg D. Richardson C. Touch-Typing with a Stylus. Proc. INTERCHI'93 Conf. Hum. Factors Comput. Syst. 1993.Stevens, M. E. (1968). Selected pattern recognition projects in Europe. Pattern Recognition, 1(2), 103-118. doi:10.1016/0031-3203(68)90002-2Hardock G. Design Issues for Line Driven Text Editing/ Annotation Systems. Proc. Conf. Graph. Interface (GI). 1991.Hardock G. Kurtenbach G. Buxton W. A Marking Based Interface for Collaborative Writing. Proc.ACM Symp. User Interface Softw. Technol. (UIST) 1993.Hinckley K. Baudisch P. Ramos G. Guimbretiere F. Design and Analysis of Delimiters for Selection-Action Pen Gesture Phrases in Scriboli. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 2005.Hong J. I. Landay J. A. SATIN: A Toolkit for Informal Ink-Based Applications. Proc. ACM Symp. User Interface Softw. Technol. (UIST) 2000.Horvitz E. Principles of Mixed-Initiative User Interfaces. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 1999.Huerst W. Yang J. Waibel A. Interactive Error Repair for an Online Handwriting Interface. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI) 2010.Jelinek F. Cambridge, Massachusetts: MIT Press; 1998. Statistical Methods for Speech Recognition.Johansson S. Atwell E. Garside R. Leech G. The Tagged LOB Corpus, User's Manual. Norwegian Computing Center for the Humanities. 1996.Karat C.-M. Halverson C. Horn D. Karat J. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 1999.Kerrick, D. D., & Bovik, A. C. (1988). Microprocessor-based recognition of handprinted characters from a tablet input. Pattern Recognition, 21(5), 525-537. doi:10.1016/0031-3203(88)90011-8Koschinski M. Winkler H. Lang M. Segmentation and Recognition of Symbols within Handwritten Mathematical Expressions. Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP). 1995.Kosmala A. Rigoll G. On-Line Handwritten Formula Recognition Using Statistical Methods. Proc. Int. Conf. Pattern Recognition (ICPR) 1998.Kristensson P. O. Discrete and continuous shape writing for text entry and control. 2007. Ph.D. Thesis, Linköping University, Sweden.Kristensson P. O. Denby L. C. Text Entry Performance of State of the Art Unconstrained Handwriting Recognition: a Longitudinal User Study. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 2009.Kristensson P. O. Denby L. C. Continuous Recognition and Visualization of Pen Strokes and Touch-Screen Gestures. Proc. Eighth Eurograph. Symp. Sketch-Based Interfaces Model. (SBIM) 2011.Kristensson P. O. Zhai S. SHARK2: A Large Vocabulary Shorthand Writing System for Pen-Based Computers. Proc. ACM Symp. User Interface Softw. Technol. (UIST). 2004.Kurtenbach G. P. The design and evaluation of marking menus. 1991. Ph.D. Thesis, University of Toronto.Kurtenbach G. P. Buxton W. Issues in Combining Marking and Direct Manipulation Techniques. Proc. ACM Symp. User Interface Softw. Technol. (UIST). 1991.Kurtenbach G. Buxton W. User Learning and Performance with Marking Menus. Proc. Extended Abstr. Hum. Factors Comput. Syst. (CHI EA) 1994.Kurtenbach, G., Sellen, A., & Buxton, W. (1993). An Empirical Evaluation of Some Articulatory and Cognitive Aspects of Marking Menus. Human-Computer Interaction, 8(1), 1-23. doi:10.1207/s15327051hci0801_1LaLomia M. User Acceptance of Handwritten Recognition Accuracy. Proc. Extended Abstr. Hum. Factors Comput. Syst. (CHI EA). 1994.Leiva L. A. Romero V. Toselli A. H. Vidal E. Evaluating an Interactive–Predictive Paradigm on Handwriting Transcription: A Case Study and Lessons Learned. Proc. 35th Annu. IEEE Comput. Softw. Appl. Conf. (COMPSAC) 2011.Leiva L. A. Alabau V. Vidal E. Error-Proof, High-Performance, and Context-Aware Gestures for Interactive Text Edition. Proc. Extended Abstr. Hum. Factors Comput. Syst. (CHI EA). 2013.Li Y. Protractor: A Fast and Accurate Gesture Recognizer. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI) 2010.Li W. Hammond T. Using Scribble Gestures to Enhance Editing Behaviors of Sketch Recognition Systems. Proc. Extended Abstr. Hum. Factors Comput. Syst. (CHI EA). 2012.Liao C. Guimbretière F. Hinckley K. Hollan J. Papiercraft: a gesture-based command system for interactive paper. ACM Trans. Comput.–Hum. Interaction (TOCHI) 2008;14:18:1-18:27.Liu P. Soong F. K. Word Graph Based Speech Rcognition Error Correction by Handwriting Input. Proc. Int. Conf. Multimodal Interfaces (ICMI). 2006.Long A. Landay J. Rowe L. Implications for a Gesture Design Tool. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI) 1999.Long A. C. Jr. Landay J. A. Rowe L. A. Michiels J. Visual Similarity of Pen Gestures. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI). 2000.MacKenzie, I. S., & Chang, L. (1999). A performance comparison of two handwriting recognizers. Interacting with Computers, 11(3), 283-297. doi:10.1016/s0953-5438(98)00030-7MacKenzie I. S. Tanaka-Ishii K. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2007. Text Entry Systems: Mobility, Accessibility, Universality.MARTI, U.-V., & BUNKE, H. (2001). USING A STATISTICAL LANGUAGE MODEL TO IMPROVE THE PERFORMANCE OF AN HMM-BASED CURSIVE HANDWRITING RECOGNITION SYSTEM. International Journal of Pattern Recognition and Artificial Intelligence, 15(01), 65-90. doi:10.1142/s0218001401000848Marti, U.-V., & Bunke, H. (2002). The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1), 39-46. doi:10.1007/s100320200071Martín-Albo D. Romero V. Toselli A. H. Vidal E. Multimodal computer-assisted transcription of text images at character-level interaction. Int. J. Pattern Recogn. Artif. Intell. 2012;26:1-19.Marzinkewitsch R. Operating Computer Algebra Systems by Hand-Printed Input. Proc. Int. Symp. Symbolic Algebr. Comput. (ISSAC). 1991.Mas, J., Llados, J., Sanchez, G., & Jorge, J. A. P. (2010). A syntactic approach based on distortion-tolerant Adjacency Grammars and a spatial-directed parser to interpret sketched diagrams. Pattern Recognition, 43(12), 4148-4164. doi:10.1016/j.patcog.2010.07.003Moyle M. Cockburn A. Analysing Mouse and Pen Flick Gestures. Proc. SIGCHI-NZ Symp. Comput.–Hum. Interact. (CHINZ). 2002.Nakayama Y. A Prototype Pen-Input Mathematical Formula Editor. Proc. AACE EdMedia 1993.Ogata J. Goto M. Speech Repair: Quick Error Correction Just by Using Selection Operation for Speech Input Interface. Proc. Eurospeech. 2005.Ortiz-Martínez D. Leiva L. A. Alabau V. Casacuberta F. Interactive Machine Translation using a Web-Based Architecture. Proc. Int. Conf. Intell. User Interfaces (IUI) 2010.Ortiz-Martínez D. Leiva L. A. Alabau V. García-Varea I. Casacuberta F. An Interactive Machine Translation System with Online Learning. Proc. Assoc. Comput. Linguist. (ACL). 2011.Michael Powers, V. (1973). Pen direction sequences in character recognition. Pattern Recognition, 5(4), 291-302. doi:10.1016/0031-3203(73)90022-8Raab F. Extremely efficient menu selection: Marking menus for the Flash platform. 2009. Available at http://www.betriebsraum.de/blog/2009/07/21/efficient-gesture-recognition-and-corner-finding-in-as3/ (retrieved on May 2012).Revuelta-Martínez A. Rodríguez L. García-Varea I. A Computer Assisted Speech Transcription System. Proc. Eur. Chap. Assoc. Comput. Linguist. (EACL). 2012.Revuelta-Martínez, A., Rodríguez, L., García-Varea, I., & Montero, F. (2013). Multimodal interaction for information retrieval using natural language. Computer Standards & Interfaces, 35(5), 428-441. doi:10.1016/j.csi.2012.11.002Rodríguez L. García-Varea I. Revuelta-Martínez A. Vidal E. A Multimodal Interactive Text Generation System. Proc. Int. Conf. Multimodal Interfaces Workshop Mach. Learn. Multimodal Interact. (ICMI-MLMI). 2010a.Rodríguez L. García-Varea I. Vidal E. Multi-Modal Computer Assisted Speech Transcription. Proc. Int. Conf. Multimodal Interfaces Workshop Mach. Learn. Multimodal Interact. (ICMI-MLMI) 2010b.Romero V. Leiva L. A. Toselli A. H. Vidal E. Interactive Multimodal Transcription of Text Images using a Web-Based Demo System. Proc. Int. Conf. Intell. User Interfaces (IUI). 2009a.Romero V. Toselli A. H. Vidal E. Using Mouse Feedback in Computer Assisted Transcription of Handwritten Text Images. Proc. Int. Conf. Doc. Anal. Recogn. (ICDAR) 2009b.Romero V. Toselli A. H. Vidal E. Study of Different Interactive Editing Operations in an Assisted Transcription System. Proc. Int. Conf. Multimodal Interfaces (ICMI). 2011.Romero V. Toselli A. H. Vidal E. Vol. 80. Singapore: World Scientific Publishing Company; 2012. Multimodal Interactive Handwritten Text Transcription.Rubine, D. (1991). Specifying gestures by example. ACM SIGGRAPH Computer Graphics, 25(4), 329-337. doi:10.1145/127719.122753Rubine D. H. 1991b. The automatic recognition of gestures. Ph.D. Thesis, Carnegie Mellon University.Sánchez-Sáez R. Leiva L. A. Sánchez J. A. Benedí J. M. Interactive Predictive Parsing using a Web-Based Architecture. Proc. North Am. Chap. Assoc. Comput. Linguist. 2010.Saund E. Fleet D. Larner D. Mahoney J. Perceptually-Supported Image Editing of Text and Graphics. Proc. ACM Symp. User Interface Softw. Technol. (UIST) 2003.Shilman M. Tan D. S. Simard P. CueTIP: a Mixed-Initiative Interface for Correcting Handwriting Errors. Proc. ACM Symp. User Interface Softw. Technol. (UIST). 2006.Signer B. Kurmann U. Norrie M. C. igesture: A General Gesture Recognition Framework. Proc. Int. Conf. Doc. Anal. Recogn. (ICDAR) 2007.Smithies S. Novins K. Arvo J. A handwriting-based equation editor. Proc. Conf. Graph. Interface (GI). 1999.Suhm, B., Myers, B., & Waibel, A. (2001). Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction, 8(1), 60-98. doi:10.1145/371127.371166Tappert C. C. Mosley P. H. Recent advances in pen computing. 2001. Technical Report 166, Pace University, available: http://support.csis.pace.edu.Toselli, A. H., Romero, V., Pastor, M., & Vidal, E. (2010). Multimodal interactive transcription of text images. Pattern Recognition, 43(5), 1814-1825. doi:10.1016/j.patcog.2009.11.019Toselli A. H. Vidal E. Casacuberta F. , editors. Berlin, Heidelberg, New York: Springer; 2011. Multimodal-Interactive Pattern Recognition and Applications.Tseng S. Fogg B. Credibility and computing technology. Commun. ACM 1999;42:39-44.Vatavu R.-D. Anthony L. Wobbrock J. O. Gestures as Point Clouds: A P Recognizer for User Interface Prototypes. Proc. Int. Conf. Multimodal Interfaces (ICMI). 2012.Vertanen K. Kristensson P. O. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. Proc. Int. Conf. Intell. User Interfaces (IUI) 2009.Vidal E. Rodríguez L. Casacuberta F. García-Varea I. Mach. Learn. Multimodal Interact., Lect. Notes Comput. Sci. Vol. 4892. Berlin, Heidelberg: Springer; 2008. Interactive Pattern Recognition.Wang X. Li J. Ao X. Wang G. Dai G. Multimodal Error Correction for Continuous Handwriting Recognition in Pen-Based User Interfaces. Proc. Int. Conf. Intell. User Interfaces (IUI). 2006.Wang L. Hu T. Liu P. Soong F. K. Efficient Handwriting Correction of Speech Recognition Errors with Template Constrained Posterior (TCP). Proc. INTERSPEECH 2008.Wobbrock J. O. Wilson A. D. Li Y. Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes. Proc. ACM Symp. User Interface Softw. Technol. (UIST). 2007.Wolf C. G. Morrel-Samuels P. The use of hand-drawn gestures for text editing. Int. J. Man–Mach. Stud. 1987;27:91-102.Zeleznik R. Miller T. Fluid Inking: Augmenting the Medium of Free-Form Inking with Gestures. Proc. Conf. Graph. Interface (GI). 2006.Yong Zhang, McCullough, C., Sullins, J. R., & Ross, C. R. (2010). Hand-Drawn Face Sketch Recognition by Humans and a PCA-Based Algorithm for Forensic Applications. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(3), 475-485. doi:10.1109/tsmca.2010.2041654Zhao S. Balakrishnan R. Simple vs. Compound Mark Hierarchical Marking Menus. Proc. ACM Symp. User Interface Softw. Technol. (UIST) 2004

RiuNet