2,010 research outputs found

    Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

    Full text link
    The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201

    Needs of family members of critically ill patients in a Critical Care Unit at Universiti Kebangsaan Malaysia Medical Centre

    Get PDF
    Fulfillment of the family needs for the critically ill patient in Critical Care Unit should be met by healthcare providers to improve patient’s quality of life. The purpose of this study was to identify the needs of family members of critically ill patients in a Critical Care Unit. A cross-sectional study was conducted on 109 family members of patient hospitalized at the Intensive Care Unit and Coronary Care Units of Universiti Kebangsaan Malaysia Medical Centre (UKMMC). The modified Critical Care Family Needs Inventory (CCFNI) comprised of 5 domains of family member’s needs: Information, Proximity, Assurance, Comfort and Support. The findings showed that assurance and information needs were the highest with (3.77 ± 0.306); (3.62 ± 0.379), proximity need (3.60 ± 0.415), support need (3.57 ± 0.477) and comfort need (3.55 ± 0.586), respectively. There was significant relationship between respondent’s relationship with family needs of proximity (p = 0.013). This study indicated that there were significant association between respondent’s monthly income and family needs of comfort and support, (p = 0.033) and (p = 0.004). There was also significant association between the gender with comfort need (p = 0.013). In this study, it was observed that information, proximity, assurance, comfort and support were opinioned as their requirements during hospitalization. Hence, it assists in coping while being admitted to Intensive Care Unit and Coronary Care Unit of UKMMC. An educational package and updating patient’s information should be emphasized to enhance the family needs of critically ill patient in the critical care settings

    Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

    Full text link
    Despite recent strides made in Speech Separation, most models are trained on datasets with neutral emotions. Emotional speech has been known to degrade performance of models in a variety of speech tasks, which reduces the effectiveness of these models when deployed in real-world scenarios. In this paper we perform analysis to differentiate the performance degradation arising from the emotions in speech from the impact of out-of-domain inference. This is measured using a carefully designed test dataset, Emo2Mix, consisting of balanced data across all emotional combinations. We show that even models with strong out-of-domain performance such as Sepformer can still suffer significant degradation of up to 5.1 dB SI-SDRi on mixtures with strong emotions. This demonstrates the importance of accounting for emotions in real-world speech separation applications.Comment: Accepted by APSIPA ASC 202

    Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings

    Full text link
    Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development. Unfortunately, NMR data can cost thousands of dollars per sample to collect and it can take a specialist weeks to assign the observed resonances to specific chemical groups. There has thus been growing interest in the NMR community to use deep learning to automate NMR data annotation. Due to similarities between NMR and audio data, we propose that methods used in acoustic signal processing can be applied to NMR as well. Using a simulated amino acid dataset, we show that by swapping out filter banks with a trainable convolutional encoder, acoustic signal embeddings from speaker verification models can be used for amino acid classification in 2D NMR spectra by treating each amino acid as a unique speaker. On an NMR dataset comparable in size with of 46 hours of audio, we achieve a classification performance of 97.7% on a 20-class problem. We also achieve a 23% relative improvement by using an acoustic embedding model compared to an existing NMR-based model

    Contrastive Speech Mixup for Low-resource Keyword Spotting

    Full text link
    Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands of training samples to learn a decent audio representation. However, with the rising demand for smart devices to become more personalized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-resource KWS. CosMix introduces an auxiliary contrastive loss to the existing mixup augmentation technique to maximize the relative similarity between the original pre-mixed samples and the augmented samples. The goal is to inject enhancing constraints to guide the model towards simpler but richer content-based speech representations from two augmented views (i.e. noisy mixed and clean pre-mixed utterances). We conduct our experiments on the Google Speech Command dataset, where we trim the size of the training set to as small as 2.5 mins per keyword to simulate a low-resource condition. Our experimental results show a consistent improvement in the performance of multiple models, which exhibits the effectiveness of our method.Comment: Accepted by ICASSP 202

    Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

    Full text link
    Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments

    Companding techniques for high dynamic range audio CODEC receiver path

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 71-72).In this thesis, an audio CODEC receiver path has been modified by the addition of companding techniques. Companding compresses the input signal and expands the output signal according to the input power strength such that additional noise from the system is suppressed while the signal content is maintained. As the compression level is varied according to the input signal strength, transients occur at the output of the system. Sudden changes in the compression level cannot be processed instantaneously and generate transients in the output. The thesis first statically demonstrates that companding can increase the signal to noise ratio and improve the dynamic range of the audio CODEC Rx path by up to 18 dB. Further, two compensation techniques are analyzed in attempt to reduce the companding transients. The results show that transients in the output are reduced on average by 60% when using either compensation technique.by Yunjie Ma.M.Eng

    Uganda Manafwa River early flood warning system development hydrologic watershed modeling using HEC-HMS, HEC-RAS, ArcGIS

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 135-136).The Manafwa River basin spans several districts in Eastern Uganda. Over the years, frequent floods have constantly posed a great threat to the local communities in these districts. The Uganda Red Cross Society (URCS) intends to design a precipitation based flood forecasting system for the Manafwa River Basin. Towards this end, the URCS initiated collaboration with MIT's Department of Civil and Environmental Engineering in January 2013, in an attempt to establish a hydrologic modeling system that relates upstream precipitation with downstream stream discharge using ArcGIS, HEC-HMS and HEC-RAS. This work is dedicated to present the progress in the modeling endeavor, provide technical guidance to the extent possible, and facilitate hydrologic modeling efforts of similar nature. The main focus is on the loss methods used in HEC-HMS: the Curve Number loss method and the Initial and Constant loss method It is found out that the neither the Curve Number nor Initial and Constant loss method is perfectly suitable to modeling both short-term and long term simulations. The Curve Number method is able to better model the precipitation-runoff processes in short term simulations. The Initial and Constant loss method tends to underestimate water volume runoff in short term simulations from what is observed The Curve Number loss method produced results that are on average closer to observed values in short term simulations; however, the resulting curve number values from calibration are considerably lower than the estimated values.by Yan Ma.M.Eng
    corecore