Análise de Escalabilidade para Armazenamento e Processamento de Arquivos de Áudio Utilizando Transformers

Abstract

ABSTRACTIn a continental-sized country like Brazil, collecting feedback ongovernmental services such as education, healthcare, and securityis challenging and impractical to perform manually, exceptthrough sampling techniques. With advancements in machine learning,particularly models based on transformers, it is now possibleto automate this process on a large scale, enabling, for instance,the dissemination of health campaign information or the collectionof citizen opinions on recently used services. This paper focuseson speech-to-text transcription, a crucial step for enabling largescalevoice-based responses.We explored scalability challenges andevaluated combinations of transcription models and audio formats(WAV, FLAC, and MP3), aiming to balance the computational costand transcription quality. Our results showed that MP3 files sampledat 14 kHz provide transcription quality comparable to WAVfiles sampled at 16 kHz while requiring only 11% of the storagesize. Furthermore, we demonstrated that smaller models, such asWav2Vec2-XLSR-53 with 3.17 × 108 parameters, can achieve resultssimilar to larger models, such as Seamless M4T, which hasapproximately an order of magnitude more parameters

Similar works

Full text

thumbnail-image

Portal de Periódicos da Univali (Universidade do Vale do Itajaí)

redirect
Last time updated on 22/06/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.