192,979 research outputs found

    Automatic Quality Estimation for ASR System Combination

    Get PDF
    Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

    Evaluation of PPG Biometrics for Authentication in different states

    Full text link
    Amongst all medical biometric traits, Photoplethysmograph (PPG) is the easiest to acquire. PPG records the blood volume change with just combination of Light Emitting Diode and Photodiode from any part of the body. With IoT and smart homes' penetration, PPG recording can easily be integrated with other vital wearable devices. PPG represents peculiarity of hemodynamics and cardiovascular system for each individual. This paper presents non-fiducial method for PPG based biometric authentication. Being a physiological signal, PPG signal alters with physical/mental stress and time. For robustness, these variations cannot be ignored. While, most of the previous works focused only on single session, this paper demonstrates extensive performance evaluation of PPG biometrics against single session data, different emotions, physical exercise and time-lapse using Continuous Wavelet Transform (CWT) and Direct Linear Discriminant Analysis (DLDA). When evaluated on different states and datasets, equal error rate (EER) of 0.5%0.5\%-6%6\% was achieved for 4545-6060s average training time. Our CWT/DLDA based technique outperformed all other dimensionality reduction techniques and previous work.Comment: Accepted at 11th IAPR/IEEE International Conference on Biometrics, 2018. 6 pages, 6 figure

    Computer Aided ECG Analysis - State of the Art and Upcoming Challenges

    Full text link
    In this paper we present current achievements in computer aided ECG analysis and their applicability in real world medical diagnosis process. Most of the current work is covering problems of removing noise, detecting heartbeats and rhythm-based analysis. There are some advancements in particular ECG segments detection and beat classifications but with limited evaluations and without clinical approvals. This paper presents state of the art advancements in those areas till present day. Besides this short computer science and signal processing literature review, paper covers future challenges regarding the ECG signal morphology analysis deriving from the medical literature review. Paper is concluded with identified gaps in current advancements and testing, upcoming challenges for future research and a bullseye test is suggested for morphology analysis evaluation.Comment: 7 pages, 3 figures, IEEE EUROCON 2013 International conference on computer as a tool, 1-4 July 2013, Zagreb, Croati

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Full text link
    We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67.6%67.6\% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure

    The 2005 AMI system for the transcription of speech in meetings

    Get PDF
    In this paper we describe the 2005 AMI system for the transcription\ud of speech in meetings used for participation in the 2005 NIST\ud RT evaluations. The system was designed for participation in the speech\ud to text part of the evaluations, in particular for transcription of speech\ud recorded with multiple distant microphones and independent headset\ud microphones. System performance was tested on both conference room\ud and lecture style meetings. Although input sources are processed using\ud different front-ends, the recognition process is based on a unified system\ud architecture. The system operates in multiple passes and makes use\ud of state of the art technologies such as discriminative training, vocal\ud tract length normalisation, heteroscedastic linear discriminant analysis,\ud speaker adaptation with maximum likelihood linear regression and minimum\ud word error rate decoding. In this paper we describe the system performance\ud on the official development and test sets for the NIST RT05s\ud evaluations. The system was jointly developed in less than 10 months\ud by a multi-site team and was shown to achieve very competitive performance

    A robust sequential hypothesis testing method for brake squeal localisation

    Get PDF
    This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)

    Algorithm for heart rate extraction in a novel wearable acoustic sensor.

    Get PDF
    Phonocardiography is a widely used method of listening to the heart sounds and indicating the presence of cardiac abnormalities. Each heart cycle consists of two major sounds - S1 and S2 - that can be used to determine the heart rate. The conventional method of acoustic signal acquisition involves placing the sound sensor at the chest where this sound is most audible. Presented is a novel algorithm for the detection of S1 and S2 heart sounds and the use of them to extract the heart rate from signals acquired by a small sensor placed at the neck. This algorithm achieves an accuracy of 90.73 and 90.69%, with respect to heart rate value provided by two commercial devices, evaluated on more than 38 h of data acquired from ten different subjects during sleep in a pilot clinical study. This is the largest dataset for acoustic heart sound classification and heart rate extraction in the literature to date. The algorithm in this study used signals from a sensor designed to monitor breathing. This shows that the same sensor and signal can be used to monitor both breathing and heart rate, making it highly useful for long-term wearable vital signs monitoring
    • 

    corecore