6,680 research outputs found

    Perceptually Motivated Wavelet Packet Transform for Bioacoustic Signal Enhancement

    Get PDF
    A significant and often unavoidable problem in bioacoustic signal processing is the presence of background noise due to an adverse recording environment. This paper proposes a new bioacoustic signal enhancement technique which can be used on a wide range of species. The technique is based on a perceptually scaled wavelet packet decomposition using a species-specific Greenwood scale function. Spectral estimation techniques, similar to those used for human speech enhancement, are used for estimation of clean signal wavelet coefficients under an additive noise model. The new approach is compared to several other techniques, including basic bandpass filtering as well as classical speech enhancement methods such as spectral subtraction, Wiener filtering, and Ephraim–Malah filtering. Vocalizations recorded from several species are used for evaluation, including the ortolan bunting (Emberiza hortulana), rhesus monkey (Macaca mulatta), and humpback whale (Megaptera novaeanglia), with both additive white Gaussian noise and environment recording noise added across a range of signal-to-noise ratios (SNRs). Results, measured by both SNR and segmental SNR of the enhanced wave forms, indicate that the proposed method outperforms other approaches for a wide range of noise conditions

    BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-based Acoustic Big Data

    Get PDF
    This paper presents a novel BigEAR big data framework that employs psychological audio processing chain (PAPC) to process smartphone-based acoustic big data collected when the user performs social conversations in naturalistic scenarios. The overarching goal of BigEAR is to identify moods of the wearer from various activities such as laughing, singing, crying, arguing, and sighing. These annotations are based on ground truth relevant for psychologists who intend to monitor/infer the social context of individuals coping with breast cancer. We pursued a case study on couples coping with breast cancer to know how the conversations affect emotional and social well being. In the state-of-the-art methods, psychologists and their team have to hear the audio recordings for making these inferences by subjective evaluations that not only are time-consuming and costly, but also demand manual data coding for thousands of audio files. The BigEAR framework automates the audio analysis. We computed the accuracy of BigEAR with respect to the ground truth obtained from a human rater. Our approach yielded overall average accuracy of 88.76% on real-world data from couples coping with breast cancer.Comment: 6 pages, 10 equations, 1 Table, 5 Figures, IEEE International Workshop on Big Data Analytics for Smart and Connected Health 2016, June 27, 2016, Washington DC, US

    Speech Recognition Technology: Improving Speed and Accuracy of Emergency Medical Services Documentation to Protect Patients

    Get PDF
    Because hospital errors, such as mistakes in documentation, cause one in six deaths each year in the United States, the accuracy of health records in the emergency medical services (EMS) must be improved. One possible solution is to incorporate speech recognition (SR) software into current tools used by EMS first responders. The purpose of this research was to determine if SR software could increase the efficiency and accuracy of EMS documentation to improve the safety of patients of EMS. An initial review of the literature on the performance of current SR software demonstrated that this software was not 99% accurate, and therefore, errors in the medical documentation produced by the software could harm patients. The literature review also identified weaknesses of SR software that could be overcome so that the software would be accurate enough for use in EMS settings. These weaknesses included the inability to differentiate between similar phrases and the inability to filter out background noise. To find a solution, an analysis of natural language processing algorithms showed that the bag-of-words post processing algorithm has the ability to differentiate between similar phrases. This algorithm is best suited for SR applications because it is simple yet effective compared to machine learning algorithms that required a large amount of training data. The findings suggested that if these weaknesses of current SR software are solved, then the software would potentially increase the efficiency and accuracy of EMS documentation. Further studies should integrate the bag-of-words post processing method into SR software and field test its accuracy in EMS settings

    Conditional Teacher-Student Learning

    Full text link
    The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance. To overcome this problem, we propose a conditional T/S learning scheme, in which a "smart" student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth. Unlike a naive linear combination of the two knowledge sources, the conditional learning is exclusively engaged with the teacher model when the teacher model's prediction is correct, and otherwise backs off to the ground truth. Thus, the student model is able to learn effectively from the teacher and even potentially surpass the teacher. We examine the proposed learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker adaptation on Microsoft short message dictation dataset. The proposed method achieves 9.8% and 12.8% relative word error rate reductions, respectively, over T/S learning for environment adaptation and speaker-independent model for speaker adaptation.Comment: 5 pages, 1 figure, ICASSP 201

    A robust sequential hypothesis testing method for brake squeal localisation

    Get PDF
    This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)
    • …
    corecore