16 research outputs found

    Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

    Full text link
    Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks. Existing multi-modal pre-training methods for the ASR task have primarily focused on single-stage pre-training where a single unsupervised task is used for pre-training followed by fine-tuning on the downstream task. In this work, we introduce a novel method combining multi-modal and multi-task unsupervised pre-training with a translation-based supervised mid-training approach. We empirically demonstrate that such a multi-stage approach leads to relative word error rate (WER) improvements of up to 38.45% over baselines on both Librispeech and SUPERB. Additionally, we share several important findings for choosing pre-training methods and datasets.Comment: Accepted in LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluatio

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Full text link
    We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements. Our approach demonstrates the potential of combined LLMs and acoustic models for a more natural and conversational interaction between humans and speech-enabled AI agents.Comment: To appear in IEEE ICASSP 202

    Assessment of the quality, content, and reliability of YouTube® videos on diabetes mellitus and polycystic ovary syndrome:a systematic review with cross-sectional analysis comparing peer-reviewed videos

    Get PDF
    YouTube® is one of the leading platforms for health information. However, the lack of regulation of content and quality raises concerns about accuracy and reliability. CoMICs (Concise Medical Information Cines) are evidence-based short videos created by medical students and junior doctors and reviewed by experts to ensure clinical accuracy. We performed a systematic review to understand the impact of videos on knowledge and awareness about diabetes and PCOS. We then evaluated the quality of YouTube® videos about diabetes and PCOS using various validated quality assessment tools and compared these with CoMICs videos on the same topics. Quality assessment tools like DISCERN, JAMA benchmark criteria, and global quality scale (GQS) score were employed. Some of the authors of this study also co-authored the creation of some of the CoMICs evaluated. Our study revealed that while videos effectively improve understanding of diabetes and PCOS, there are notable differences in quality and reliability of the videos on YouTube®. For diabetes, CoMICs videos had higher DISCERN scores (CoMICs vs YouTube®: 2.4 vs 1.6), superior reliability (P &lt; 0.01), and treatment quality (P &lt; 0.01) and met JAMA criteria for authorship (100% vs 30.6%) and currency (100% vs 53.1%). For PCOS, CoMICs had higher DISCERN scores (2.9 vs 1.9), reliability (P &lt; 0.01), and treatment quality (P &lt; 0.01); met JAMA criteria for authorship (100% vs 34.0%) and currency (100% vs 54.0%); and had higher GQS scores (4.0 vs 3.0). In conclusion, CoMICs outperformed other similar sources on YouTube® in providing reliable evidence-based medical information which may be used for patient education.</p

    A Private cloud with live Performance MonitoringAnalysis

    No full text
    The open-source monitoring program Prometheus is available. Open-Stack is becoming the most popular open source cloud platform because to ongoing developments in cloud computing. In order to guarantee the reliability and stability of cloud platforms when utilised as production systems, thorough system monitoring is an essential link and a critical method. Along with the capabilities of the Open-Stack cloud platform, this article uses Prometheus to collect monitoring data, takes use of the real-time monitoring data visualization application grafana, and develops a comprehensive, clever, and efficient monitoring system. Testing is a viable method for improving the dependability and stability of the Open-Stack cloud platform
    corecore