The advent of transformer-based models has revolutionized natural language processing, bringing remarkable improvements in tasks like automatic speech recognition (ASR). Inspired by these advancements, this thesis explores the optimization of a transformer-based ASR model to improve transcription accuracy in educational settings, particularly for lecture content. The goal of this research is to provide real-time, high-accuracy captions that enhance accessibility for all students, while offering a cost-effective solution for educators. To assess the potential of domain-specific fine-tuning, Whisper-small underwent two phases of fine-tuning. In the first phase, it was finetuned on care- fully selected, publicly available datasets: SpeechColab’s Gigaspeech-XS [39], AMI Meeting corpus [14]. In the second phase, fine-tuned model was optimized on a self-curated dataset [16] consisting of roughly 10 hours of live lecture recordings collected and assembled by me. Finally, a real-time captioning assistant application was developed to leverage the finetuned model and transcribe speech in real time with live editing capabilities. The optimized Whisper-small model was evaluated against Whisper’s retrained small, medium and large(version 2) counterparts. The evaluation was performed on a clean unseen data [15] prepared by me. The fine-tuned model achieved lower Word Error Rates (WER) of 4.53%, compared to 5.51% and 5.78% for Whisper-Medium and Whisper-Large-V2 respectively. These results demonstrate that fine-tuning a transformer-based ASR model on domain- specific data can significantly enhance its performance in a targeted context, such as live lecture transcription. The findings of this experiment highlight the promise of transformer-based models for improving educational accessibility. From thereon, building an application tailored to live lecture settings, this research contributes to the development of adaptable, low-cost technologies that support inclusive learning environments. The success of this experiment lays the groundwork for future breakthroughs in speech recognition, aiming to make education more accessible for everyone
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.