14,973 research outputs found

    Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories

    Full text link
    [EN] Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Polite`cnica de Vale`ncia (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on CM with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies. 2015 Elsevier B.V. All rights reserved.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant agreement no. 287755 (transLectures) and ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under Grant agreement no. 621030 (EMMA), and the Spanish MINECO Active2Trans (TIN2012-31723) research project.Valor Miró, JD.; Silvestre Cerdà, JA.; Civera Saiz, J.; Turró Ribalta, C.; Juan Císcar, A. (2015). Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Communication. 74:65-75. https://doi.org/10.1016/j.specom.2015.09.006S65757

    The impact of technology on children’s attainment in English: a review of the literature

    Get PDF

    The Digital Revolution in Qualitative Research: Working with Digital Audio Data Through Atlas.Ti

    Get PDF
    Modern versions of Computer Assisted Qualitative Data Analysis Software (CAQDAS) are enabling the analysis of audio sound files instead of relying solely on text-based analysis. Along with other developments in computer technologies such as the proliferation of digital recording devices and the potential for using streamed media in online academic publication, this innovation is increasing the possibilities of systematically using media-rich, naturalistic data in place of transcribed 'de-naturalised' forms. This paper reports on a project assessing online learning materials that used Atlas.ti software to analyse sound files, and it describes the problems faced in gathering, analysing and using this data for report writing. It concludes that there are still serious barriers to the full and effective integration of audio data into qualitative research: the absence of 'industry standard' recording technology, the underdevelopment of audio interfaces in Atlas.ti (as a key CAQDAS package), and the conventional approach to data use in many online publication formats all place serious restrictions on the integrated use of this data. Nonetheless, it is argued here that there are clear benefits in pushing for resolutions to these problems as the use of this naturalistic data through digital formats may help qualitative researchers to overcome some long-standing methodological issues: in particular, the ability to overcome the reliance on data transcription rather than 'natural' data, and the possibility of implementing research reports that facilitate a more transparent use of 'reusable' data, are both real possibilities when using these digital technologies, which could substantially change the shape of qualitative research practice.CAQDAS, Recording Technology, Online Publication

    Ubiquitous computing: Anytime, anyplace, anywhere?

    Get PDF
    Computers are ubiquitous, in terms that they are everywhere, but does this mean the same as ubiquitous computing? Views are divided. The convergent device (one-does-all) view posits the computer as a tool through which anything, and indeed everything, can be done (Licklider & Taylor, 1968). The divergent device (many-do-all) view, by contrast, offers a world where microprocessors are embedded in everything and communicating with one another (Weiser, 1991). This debate is implicitly present in this issue, with examples of the convergent device in Crook & Barrowcliff's paper and in Gay et al's paper, and examples of the divergent devices in Thomas & Gellersen's paper and Baber's paper. I suspect both streams of technology are likely to co-exist

    Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces

    Get PDF
    Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance

    Challenges in Transcribing Multimodal Data: A Case Study

    Get PDF
    open2siComputer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments.Cited 10 times as of November 2020 including the prestigious Language Learning Sans Frontiers: A Translanguaging View L Wei, WYJ Ho - Annual Review of Applied Linguistics, 2018 - cambridge.org In this article, we present an analytical approach that focuses on how transnational and translingual learners mobilize their multilingual, multimodal, and multisemiotic repertoires, as well as their learning and work experiences, as resources in language learning. The … Cited by 23 Related articles All 11 versionsopenFrancesca, Helm; Melinda DoolyHelm, Francesca; Melinda, Dool

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

    Full text link
    Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising
    corecore