19 research outputs found

    DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score

    Get PDF
    We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for soundquality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create highquality output signals. However, since most OSQA scores are not analytically tractable, i.e., they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of black-box optimization, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability-density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized

    Head-related Impulse Response Cues for Spatial Auditory Brain-computer Interface

    Full text link
    This study provides a comprehensive test of a head-related impulse response (HRIR) cues for a spatial auditory brain-computer interface (saBCI) speller paradigm. We present a comparison with the conventional virtual sound headphone-based spatial auditory modality. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participated in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEP) resulted with encouragingly good and stable P300 responses in online BCI experiments. Our case study indicated that users could perceive elevation in the saBCI experiments generated using the HRIR measured from a general head model. The saBCI accuracy and information transfer rate (ITR) scores have been improved comparing to the classical horizontal plane-based virtual spatial sound reproduction modality, as far as the healthy users in the current pilot study are concerned.Comment: 4 pages, 4 figures, accepted for EMBC 2015, IEEE copyrigh

    聴覚の内因性空間的注意に関する研究

    Get PDF
    Tohoku University坂本修一課

    Objective assessment of speech intelligibility.

    Get PDF
    This thesis addresses the topic of objective speech intelligibility assessment. Speech intelligibility is becoming an important issue due most possibly to the rapid growth in digital communication systems in recent decades; as well as the increasing demand for security-based applications where intelligibility, rather than the overall quality, is the priority. Afterall, the loss of intelligibility means that communication does not exist. This research sets out to investigate the potential of automatic speech recognition (ASR) in intelligibility assessment, the motivation being the obvious link between word recognition and intelligibility. As a pre-cursor, quality measures are first considered since intelligibility is an attribute encompassed in overall quality. Here, 9 prominent quality measures including the state-of-the-art Perceptual Evaluation of Speech Quality (PESQ) are assessed. A large range of degradations are considered including additive noise and those introduced by coding and enhancement schemes. Experimental results show that apart from Weighted Spectral Slope (WSS), generally the quality scores from all other quality measures considered here correlate poorly with intelligibility. Poor correlations are observed especially when dealing with speech-like noises and degradations introduced by enhancement processes. ASR is then considered where various word recognition statistics, namely word accuracy, percentage correct, deletion, substitution and insertion are assessed as potential intelligibility measure. One critical contribution is the observation that there are links between different ASR statistics and different forms of degradation. Such links enable suitable statistics to be chosen for intelligibility assessment in different applications. In overall word accuracy from an ASR system trained on clean signals has the highest correlation with intelligibility. However, as is the case with quality measures, none of the ASR scores correlate well in the context of enhancement schemes since such processes are known to improve machine-based scores without necessarily improving intelligibility. This demonstrates the limitation of ASR in intelligibility assessment. As an extension to word modelling in ASR, one major contribution of this work relates to the novel use of a data-driven (DD) classifier in this context. The classifier is trained on intelligibility information and its output scores relate directly to intelligibility rather than indirectly through quality or ASR scores as in earlier attempts. A critical obstacle with the development of such a DD classifier is establishing the large amount of ground truth necessary for training. This leads to the next significant contribution, namely the proposal of a convenient strategy to generate potentially unlimited amounts of synthetic ground truth based on a well-supported hypothesis that speech processings rarely improve intelligibility. Subsequent contributions include the search for good features that could enhance classification accuracy. Scores given by quality measures and ASR are indicative of intelligibility hence could serve as potential features for the data-driven intelligibility classifier. Both are in investigated in this research and results show ASR-based features to be superior. A final contribution is a novel feature set based on the concept of anchor models where each anchor represents a chosen degradation. Signal intelligibility is characterised by the similarity between the degradation under test and a cohort of degradation anchors. The anchoring feature set leads to an average classification accuracy of 88% with synthetic ground truth and 82% with human ground truth evaluation sets. The latter compares favourably with 69% achieved by WSS (the best quality measure) and 68% by word accuracy from a clean-trained ASR (the best ASR-based measure) which are assessed on identical test sets

    Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing

    Get PDF
    corecore