10 research outputs found

    The automatic analysis of classroom talk

    Get PDF
    The SMART SPEECH Project is a joint venture between three Finnish universities and a Chilean university. The aim is to develop a mobile application that can be used to record classroom talk and enable observations to be made of classroom interactions. We recorded Finnish and Chilean physics teachers’ speech using both a conventional microphone/dictator setup and a microphone/mobile application setup. The recordings were analysed via automatic speech recognition (ASR). The average word error rate achieved for the Finnish teachers’ speech was under 40%. The ASR approach also enabled us to determine the key topics discussed within the Finnish physics lessons under scrutiny. The results here were promising as the recognition accuracy rate was about 85% on average

    Adaptering av akustiska modeller och språkmodeller för en mobil dikteringstjänst

    No full text
    Automatisk taligenkänning är en maskinstyrd metod genom vilken tal omvandlas till text. MobiDic är en mobil dikteringstjänst som använder ett serverbaserat automatiskt taligenkänningssystem för att omvandla tal inspelat på en mobiltelefon till läsbara och editerbara textdokument. I detta arbete undersöktes förmågan hos Tekniska Högskolans taligenkänningssystem att omvandla juridik-relaterat tal inspelat på en mobiltelefon med MobiDics klientprogram till korrekt text. Det fanns skillnader mellan test- och träningsdata gällande både akustik och språk. De akutiska bakgrundsmodellerna var tränade med tal som hade spelats in på en datormikrofon. Språkmodellerna var tränade med text från olika tidningar och nyhetstjänster. På grund av testdatans speciella karaktär har tyngdpunkten i arbetet legat på att förbättra taligenkänningsförmågan hos systemet genom adaptering av akustiska modeller och språkmodeller. Adaptering av akustiska modeller ger de bästa och pålitligaste resultaten i syftet att förbättra taligenkänningsförmågan. Genom att använda den globala cMLLR-metoden och endast 2 minuter av adapteringsdata kan man förminska antalet feltolkade ord med 15-22%. Genom att använda den regressionsklassbaserade cMLLR-metoden kan man uppnåytterligare förbättringar i taligenkänningsförmågan om det finns större mängder av adapteringsdata (> 10 min.) tillgängligt. Adaptering av språkmodellen gav ingen betydande förbättring av taligenkännings förmågan. Det främsta problemet var de stora skillnaderna mellan språkadapteringsdata och språket som förekom i de juridik-relaterade talinspelningarna.Automatic speech recognition is the machine-based method of converting speech to text. MobiDic is a mobile dictation service which uses a server-side speech recognition system to convert speech recorded on a mobile phone to readable and editable text notes. In this work, performance of the TKK speech recognition system has been evaluated on law-related speech recorded on a mobile phone with the MobiDic client application. There was mismatch between testing and training data in terms of both of acoustics and language. The background acoustic models were trained on speech recorded on PC microphones. The background language models were trained on texts from journals and news wire services. Because of the special nature of the testing data, main focus has been on using acoustic model and language model adaptation methods to enhance speech recognition performance. Acoustic model adaptation gives the highest and most reliable performance increase. Using the global cMLLR method, word error rate reductions between 15-22% can be reached with only 2 minutes of adaptation data. Regression class cMLLR can give even higher performance boosts if larger sets of audio adaptation data (> 10 min) are available. Language model adaptation was not able to significantly improve performance in this task. The main problems were differences between language adaptation data and language of the law-related speech data

    Continuous Unsupervised Topic Adaptation for Morph-based Speech Recognition

    No full text
    Modern automatic speech recognition (ASR) systems are speaker independent and designed to recognize continuous large vocabulary speech. The key components of an ASR system are the acoustic model, language model, lexicon and decoder. A constant challenge for an ASR system over time, is how to adapt to changing topics and the introduction of new names and words. Enabling continuous topic adaptation for ASR systems requires finding new relevant text sources for adapting the language model and identifying words which need new and modified pronunciation rules. In this thesis, unsupervised methods that enable continuous topic adaptation for a Finnish morph-based ASR system are studied. Based on first-pass ASR output, topic and time relevant text data is retrieved from a collection of pre-indexed Web texts. Adapting the background language model with the best matching texts improves recognition accuracy. The recognition accuracy of foreign names and acronyms, one of the focus areas in this thesis, is also improved. Further improvement is achieved by identifying foreign names and acronyms in the retrieved texts, and generating adapted pronunciation rules for them. In statistical morph-based ASR, words are sometimes oversegmented. To enable a more reliable and easier mapping of adapted pronunciation rules, oversegmented foreign names and acronyms are restored back into their base forms. Morpheme restoration also improves recognition accuracy slightly. User feedback is also explored in this thesis for enabling ongoing lexicon adaptation of ASR systems. Based on user corrections of ASR output, optimal pronunciation rules for mis-recognized words are recovered by using forced alignment and Viterbi decoding. A collection of recovered pronunciation rules can be used for the recognition of new speech data. Experiments showed some minor improvements in the recognition of foreign names using user feedback based lexicon adaptation

    Modeling under-resourced languages for speech recognition

    No full text
    One particular problem in large vocabulary continuous speech recognition for low-resourced languages is finding relevant training data for the statistical language models. Large amount of data is required, because models should estimate the probability for all possible word sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem with the data is the huge amount of different word forms that are common in normal speech. The same problem exists also in other language technology applications such as machine translation, information retrieval, and in some extent also in other morphologically rich languages. In this paper we present methods and evaluations in four recent language modeling topics: selecting conversational data from the Internet, adapting models for foreign words, multi-domain and adapted neural network language modeling, and decoding with subword units. Our evaluations show that the same methods work in more than one language and that they scale down to smaller data resources.Peer reviewe

    ASR in Classroom Today : Automatic Visualization of Conceptual Network in Science Classrooms

    No full text
    Automatic Speech Recognition (ASR) field has improved substantially in the last years. We are in a point never saw before, where we can apply such algorithms in non-ideal conditions such as real classrooms. In these scenarios it is still not possible to reach perfect recognition rates, however we can already take advantage of these improvements. This paper shows preliminary results using ASR in Chilean and Finnish middle and high school to automatically provide teachers a visualization of the structure of concepts present in their discourse in science classrooms. These visualizations are conceptual networks that relate key concepts used by the teacher. This is an interesting tool that gives feedback to the teacher about his/her pedagogical practice in classes. The result of initial comparisons shows great similarity between conceptual networks generated in a manual way with those generated automatically.peerReviewe

    The automatic analysis of classroom talk

    Get PDF
    The SMART SPEECH Project is a joint venture between three Finnish universities and a Chilean university. The aim is to develop a mobile application that can be used to record classroom talk and enable observations to be made of classroom interactions. We recorded Finnish and Chilean physics teachers’ speech using both a conventional microphone/dictator setup and a microphone/mobile application setup. The recordings were analysed via automatic speech recognition (ASR). The average word error rate achieved for the Finnish teachers’ speech was under 40%. The ASR approach also enabled us to determine the key topics discussed within the Finnish physics lessons under scrutiny. The results here were promising as the recognition accuracy rate was about 85% on average.peerReviewe

    Digitala: An augmented test and review process prototype for high-stakes spoken foreign language examination

    No full text
    This paper introduces the first prototype for a computerised examination procedure of spoken foreign languages in Finland, intended for national scale upper secondary school final examinations. Speech technology and profiling of reviewers are used to minimise the otherwise massive reviewing effort.Peer reviewe
    corecore