3,925 research outputs found

    Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor

    Full text link
    Using supporting backchannel (BC) cues can make human-computer interaction more social. BCs provide a feedback from the listener to the speaker indicating to the speaker that he is still listened to. BCs can be expressed in different ways, depending on the modality of the interaction, for example as gestures or acoustic cues. In this work, we only considered acoustic cues. We are proposing an approach towards detecting BC opportunities based on acoustic input features like power and pitch. While other works in the field rely on the use of a hand-written rule set or specialized features, we made use of artificial neural networks. They are capable of deriving higher order features from input features themselves. In our setup, we first used a fully connected feed-forward network to establish an updated baseline in comparison to our previously proposed setup. We also extended this setup by the use of Long Short-Term Memory (LSTM) networks which have shown to outperform feed-forward based setups on various tasks. Our best system achieved an F1-Score of 0.37 using power and pitch features. Adding linguistic information using word2vec, the score increased to 0.39

    Investigating the Effects of Word Substitution Errors on Sentence Embeddings

    Full text link
    A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.Comment: 4 Pages, 2 figures. Copyright IEEE 2019. Accepted and to appear in the Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing 2019 (IEEE-ICASSP-2019), May 12-17 in Brighton, U.K. Personal use of this material is permitted. However, permission to reprint/republish this material must be obtained from the IEE

    Interactive Transcription of Old Text Documents

    Full text link
    Nowadays, there are huge collections of handwritten text documents in libraries all over the world. The high demand for these resources has led to the creation of digital libraries in order to facilitate the preservation and provide electronic access to these documents. However text transcription of these documents im- ages are not always available to allow users to quickly search information, or computers to process the information, search patterns or draw out statistics. The problem is that manual transcription of these documents is an expensive task from both economical and time viewpoints. This thesis presents a novel ap- proach for e cient Computer Assisted Transcription (CAT) of handwritten text documents using state-of-the-art Handwriting Text Recognition (HTR) systems. The objective of CAT approaches is to e ciently complete a transcription task through human-machine collaboration, as the e ort required to generate a manual transcription is high, and automatically generated transcriptions from state-of-the-art systems still do not reach the accuracy required. This thesis is centered on a special application of CAT, that is, the transcription of old text document when the quantity of user e ort available is limited, and thus, the entire document cannot be revised. In this approach, the objective is to generate the best possible transcription by means of the user e ort available. This thesis provides a comprehensive view of the CAT process from feature extraction to user interaction. First, a statistical approach to generalise interactive transcription is pro- posed. As its direct application is unfeasible, some assumptions are made to apply it to two di erent tasks. First, on the interactive transcription of hand- written text documents, and next, on the interactive detection of the document layout. Next, the digitisation and annotation process of two real old text documents is described. This process was carried out because of the scarcity of similar resources and the need of annotated data to thoroughly test all the developed tools and techniques in this thesis. These two documents were carefully selected to represent the general di culties that are encountered when dealing with HTR. Baseline results are presented on these two documents to settle down a benchmark with a standard HTR system. Finally, these annotated documents were made freely available to the community. It must be noted that, all the techniques and methods developed in this thesis have been assessed on these two real old text documents. Then, a CAT approach for HTR when user e ort is limited is studied and extensively tested. The ultimate goal of applying CAT is achieved by putting together three processes. Given a recognised transcription from an HTR system. The rst process consists in locating (possibly) incorrect words and employs the user e ort available to supervise them (if necessary). As most words are not expected to be supervised due to the limited user e ort available, only a few are selected to be revised. The system presents to the user a small subset of these words according to an estimation of their correctness, or to be more precise, according to their con dence level. Next, the second process starts once these low con dence words have been supervised. This process updates the recogni- tion of the document taking user corrections into consideration, which improves the quality of those words that were not revised by the user. Finally, the last process adapts the system from the partially revised (and possibly not perfect) transcription obtained so far. In this adaptation, the system intelligently selects the correct words of the transcription. As results, the adapted system will bet- ter recognise future transcriptions. Transcription experiments using this CAT approach show that this approach is mostly e ective when user e ort is low. The last contribution of this thesis is a method for balancing the nal tran- scription quality and the supervision e ort applied using our previously de- scribed CAT approach. In other words, this method allows the user to control the amount of errors in the transcriptions obtained from a CAT approach. The motivation of this method is to let users decide on the nal quality of the desired documents, as partially erroneous transcriptions can be su cient to convey the meaning, and the user e ort required to transcribe them might be signi cantly lower when compared to obtaining a totally manual transcription. Consequently, the system estimates the minimum user e ort required to reach the amount of error de ned by the user. Error estimation is performed by computing sepa- rately the error produced by each recognised word, and thus, asking the user to only revise the ones in which most errors occur. Additionally, an interactive prototype is presented, which integrates most of the interactive techniques presented in this thesis. This prototype has been developed to be used by palaeographic expert, who do not have any background in HTR technologies. After a slight ne tuning by a HTR expert, the prototype lets the transcribers to manually annotate the document or employ the CAT ap- proach presented. All automatic operations, such as recognition, are performed in background, detaching the transcriber from the details of the system. The prototype was assessed by an expert transcriber and showed to be adequate and e cient for its purpose. The prototype is freely available under a GNU Public Licence (GPL).Serrano Martínez-Santos, N. (2014). Interactive Transcription of Old Text Documents [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37979TESI
    corecore