14,973 research outputs found
Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories
[EN] Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual
assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability,
accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are
prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality.
This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic
transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat
Polite`cnica de Vale`ncia (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video
objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate
the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions
correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each
individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second
phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional
post-editing strategy. Finally, a third strategy resulting from the combination of that based on CM with massive adaptation techniques
for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned
strategies.
2015 Elsevier B.V. All rights reserved.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant agreement no. 287755 (transLectures) and ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under Grant agreement no. 621030 (EMMA), and the Spanish MINECO Active2Trans (TIN2012-31723) research project.Valor Miró, JD.; Silvestre Cerdà, JA.; Civera Saiz, J.; Turró Ribalta, C.; Juan Císcar, A. (2015). Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Communication. 74:65-75. https://doi.org/10.1016/j.specom.2015.09.006S65757
The Digital Revolution in Qualitative Research: Working with Digital Audio Data Through Atlas.Ti
Modern versions of Computer Assisted Qualitative Data Analysis Software (CAQDAS) are enabling the analysis of audio sound files instead of relying solely on text-based analysis. Along with other developments in computer technologies such as the proliferation of digital recording devices and the potential for using streamed media in online academic publication, this innovation is increasing the possibilities of systematically using media-rich, naturalistic data in place of transcribed 'de-naturalised' forms. This paper reports on a project assessing online learning materials that used Atlas.ti software to analyse sound files, and it describes the problems faced in gathering, analysing and using this data for report writing. It concludes that there are still serious barriers to the full and effective integration of audio data into qualitative research: the absence of 'industry standard' recording technology, the underdevelopment of audio interfaces in Atlas.ti (as a key CAQDAS package), and the conventional approach to data use in many online publication formats all place serious restrictions on the integrated use of this data. Nonetheless, it is argued here that there are clear benefits in pushing for resolutions to these problems as the use of this naturalistic data through digital formats may help qualitative researchers to overcome some long-standing methodological issues: in particular, the ability to overcome the reliance on data transcription rather than 'natural' data, and the possibility of implementing research reports that facilitate a more transparent use of 'reusable' data, are both real possibilities when using these digital technologies, which could substantially change the shape of qualitative research practice.CAQDAS, Recording Technology, Online Publication
Ubiquitous computing: Anytime, anyplace, anywhere?
Computers are ubiquitous, in terms that they are everywhere, but does this mean the same as ubiquitous computing? Views are divided. The convergent device (one-does-all) view posits the computer as a tool through which anything, and indeed everything, can be done (Licklider & Taylor, 1968). The divergent device (many-do-all) view, by contrast, offers a world where microprocessors are embedded in everything and communicating with one another (Weiser, 1991). This debate is implicitly present in this issue, with examples of the convergent device in Crook & Barrowcliff's paper and in Gay et al's paper, and examples of the divergent devices in Thomas & Gellersen's paper and Baber's paper. I suspect both streams of technology are likely to co-exist
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance
Challenges in Transcribing Multimodal Data: A Case Study
open2siComputer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in personal and professional lives, educational initiatives, particularly language teaching and learning, are following suit. For researchers interested in exploring learner interactions in complex technology-supported learning environments, new challenges inevitably emerge. This article looks at the challenges of transcribing and representing multimodal data (visual, oral, and textual) when engaging in computer-assisted language learning research. When transcribing and representing such data, the choices made depend very much on the specific research questions addressed, hence in this paper we explore these challenges through discussion of a specific case study where the researchers were seeking to explore the emergence of identity through interaction in an online, multimodal situated space. Given the limited amount of literature addressing the transcription of online multimodal communication, it is felt that this article is a timely contribution to researchers interested in exploring interaction in CMC language and intercultural learning environments.Cited 10 times as of November 2020 including the prestigious
Language Learning Sans Frontiers: A Translanguaging View
L Wei, WYJ Ho - Annual Review of Applied Linguistics, 2018 - cambridge.org
In this article, we present an analytical approach that focuses on how transnational and
translingual learners mobilize their multilingual, multimodal, and multisemiotic repertoires,
as well as their learning and work experiences, as resources in language learning. The …
Cited by 23 Related articles All 11 versionsopenFrancesca, Helm; Melinda DoolyHelm, Francesca; Melinda, Dool
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training
Self-imitating feedback is an effective and learner-friendly method for
non-native learners in Computer-Assisted Pronunciation Training. Acoustic
characteristics in native utterances are extracted and transplanted onto
learner's own speech input, and given back to the learner as a corrective
feedback. Previous works focused on speech conversion using prosodic
transplantation techniques based on PSOLA algorithm. Motivated by the visual
differences found in spectrograms of native and non-native speeches, we
investigated applying GAN to generate self-imitating feedback by utilizing
generator's ability through adversarial training. Because this mapping is
highly under-constrained, we also adopt cycle consistency loss to encourage the
output to preserve the global structure, which is shared by native and
non-native utterances. Trained on 97,200 spectrogram images of short utterances
produced by native and non-native speakers of Korean, the generator is able to
successfully transform the non-native spectrogram input to a spectrogram with
properties of self-imitating feedback. Furthermore, the transformed spectrogram
shows segmental corrections that cannot be obtained by prosodic
transplantation. Perceptual test comparing the self-imitating and correcting
abilities of our method with the baseline PSOLA method shows that the
generative approach with cycle consistency loss is promising
- …