8 research outputs found

    Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

    Full text link
    Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%

    Punctuation Restoration as Post-processing Step for Swedish Language Automatic Speech Recognition

    No full text
    This thesis focuses on the Swedish language, where punctuation restoration, especially as a postprocessing step for the output of Automatic Speech Recognition (ASR) applications, needs furtherresearch. I have collaborated with NewsMachine AB, a company that provides large-scale mediamonitoring services for its clients, for which it employs ASR technology to convert spoken contentinto text.This thesis follows an approach initially designed for high-resource languages such as English. Themethod is based on KB-BERT, a pre-trained Swedish neural network language model developedby the National Library of Sweden. The project uses KB-BERT with a Bidirectional Long-ShortTerm Memory (BiLSTM) layer on top for the task of punctuation restoration. The model is finetuned using the TED Talk 2020 dataset in Swedish, which is acquired from OPUS (an open-sourceparallel corpus). The punctuation marks comma, period, question mark, and colon are considered for this project. A comparative analysis is conducted between two KB-BERT models: bertbase-swedish-cased and albert-base-swedish-cased-alpha. The fine-tuned Swedish BERT-BiLSTMmodel, trained on 5 classes, achieved an overall F1-score of 81.6%, surpassing the performance ofthe ALBERT-BiLSTM model, which was also trained on 5 classes and obtained an overall F1-scoreof 66.6%. Additionally, the BERT-BiLSTM model, trained on 4 classes (excluding colon), outperformed prestoBERT, an existing model designed for the same task in Swedish, with an overallF1-score of 82.8%. In contrast, prestoBERT achieved an overall F1-score of 78.9%.As a further evaluation of the model’s performance on ASR transcribed text, noise was injectedbased on four probabilities (0.05, 0.1, 0.15, 0.2) into a copy of the test data in the form of threeword-level errors (deletion, substitution, and insertion). The performance of the BERT-BiLSTMmodel substantially decreased for all the errors as the probability of noise injected increased. Incontrast, the model still performed comparatively better when dealing with deletion errors as compared to substitution and insertion errors. Lastly, the data resources received from NewsMachineAB were used to perform a qualitative assessment of how the model performs in punctuating realtranscribed data as compared to human judgment

    Punctuation Restoration as Post-processing Step for Swedish Language Automatic Speech Recognition

    No full text
    This thesis focuses on the Swedish language, where punctuation restoration, especially as a postprocessing step for the output of Automatic Speech Recognition (ASR) applications, needs furtherresearch. I have collaborated with NewsMachine AB, a company that provides large-scale mediamonitoring services for its clients, for which it employs ASR technology to convert spoken contentinto text.This thesis follows an approach initially designed for high-resource languages such as English. Themethod is based on KB-BERT, a pre-trained Swedish neural network language model developedby the National Library of Sweden. The project uses KB-BERT with a Bidirectional Long-ShortTerm Memory (BiLSTM) layer on top for the task of punctuation restoration. The model is finetuned using the TED Talk 2020 dataset in Swedish, which is acquired from OPUS (an open-sourceparallel corpus). The punctuation marks comma, period, question mark, and colon are considered for this project. A comparative analysis is conducted between two KB-BERT models: bertbase-swedish-cased and albert-base-swedish-cased-alpha. The fine-tuned Swedish BERT-BiLSTMmodel, trained on 5 classes, achieved an overall F1-score of 81.6%, surpassing the performance ofthe ALBERT-BiLSTM model, which was also trained on 5 classes and obtained an overall F1-scoreof 66.6%. Additionally, the BERT-BiLSTM model, trained on 4 classes (excluding colon), outperformed prestoBERT, an existing model designed for the same task in Swedish, with an overallF1-score of 82.8%. In contrast, prestoBERT achieved an overall F1-score of 78.9%.As a further evaluation of the model’s performance on ASR transcribed text, noise was injectedbased on four probabilities (0.05, 0.1, 0.15, 0.2) into a copy of the test data in the form of threeword-level errors (deletion, substitution, and insertion). The performance of the BERT-BiLSTMmodel substantially decreased for all the errors as the probability of noise injected increased. Incontrast, the model still performed comparatively better when dealing with deletion errors as compared to substitution and insertion errors. Lastly, the data resources received from NewsMachineAB were used to perform a qualitative assessment of how the model performs in punctuating realtranscribed data as compared to human judgment

    Caste, social networks and variety adoption

    No full text
    Social networks influence technology diffusion but targeting formal leaders (institutional central nodes) may lead to distributional consequences. This paper analyzes the role of informal social networks in technology diffusion in a socially hierarchical caste-based society. Often, information flow and technology diffusion are constrained by social and economic boundaries where informal nodes such as caste play a very decisive role in everyday life. Proper targeting and dissemination of technology to the marginalized sections of society are very important for their development. We observed that only one-fourth of farmers cultivate newer varieties which include hybrids and recently released high yielding varieties. The results showed that individuals belonging to marginal groups are influential and act as informal leaders when they are the dominant caste in the village. Progressive farmers are found to fail in disseminating new varieties, and targeting influential informal leaders who belong to the dominant caste of the village appears to be a better strategy. Among non-dominant caste members, influential leaders belonging to Other Backward Classes (OBCs) or Scheduled Tribes (STs) are more desirable targets than other caste groups. The more concentrated a network is in terms of its caste composition, the faster will be the spread of any technology

    COVID-19, Mucormycosis and Cancer: The Triple Threat—Hypothesis or Reality?

    No full text
    COVID-19 has been responsible for widespread morbidity and mortality worldwide. Invasive mucormycosis has death rates scaling 80%. India, one of the countries hit worst by the pandemic, is also a hotbed with the highest death rates for mucormycosis. Cancer, a ubiquitously present menace, also contributes to higher case fatality rates. All three entities studied here are individual, massive healthcare threats. The danger of one disease predisposing to the other, the poor performance status of patients with all three diseases, the impact of therapeutics for one disease on the pathology and therapy of the others all warrant physicians having a better understanding of the interplay. This is imperative so as to effectively establish control over the individual patient and population health. It is important to understand the interactions to effectively manage all three entities together to reduce overall morbidity. In this review article, we search for an inter-relationship between the COVID-19 pandemic, emerging mucormycosis, and the global giant, cancer

    Synergy of biofuel production with waste remediation along with value-added co-products recovery through microalgae cultivation: A review of membrane-integrated green approach

    No full text
    corecore