13 research outputs found

    Automated speech audiometry:Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?

    Get PDF
    A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where a software scores the responses entered by the listener. The test has 24 digit-triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN+Kaldi-NL test. Their spoken responses were recorded, and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0 - 48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analysed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications

    A simulation environment for studying transcutaneous electrotactile stimulation

    Get PDF
    Transcutaneous electrical nerve stimulation (TENS) allows the artificial excitation of nerve fibres by applying electric-current pulses through electrodes on the skin's surface. This work involves the development of a simulation environment that can be used for studying transcutaneous electrotactile stimulation and its dependence on electrode layout and excitation patterns. Using an eight-electrode array implementation, it is shown how nerves located at different depths and with different orientations respond to specific injected currents, allowing the replication of already reported experimental findings and the creation of new hypotheses about the tactile sensations associated with certain stimulation patterns. The simulation consists of a finite element model of a human finger used to calculate the distribution of the electric potential in the finger tissues neglecting capacitive effects, and a cable model to calculate the excitation/inhibition of action potentials in each nerve

    Automated Speech Audiometry: Can It Work Using Open-Source Pre-Trained Kaldi-NL Automatic Speech Recognition?

    Get PDF
    A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where software scores the responses entered by the listener. The test has 24-digit triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19–64 years) completed one DIN + Kaldi-NL test. Their spoken responses were recorded and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0–48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analyzed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications

    Evaluating speech-in-speech perception via a humanoid robot

    Get PDF
    International audienceIntroduction: Underlying mechanisms of speech perception masked by background speakers, a common daily listening condition, are often investigated using various and lengthy psychophysical tests. The presence of a social agent, such as an interactive humanoid NAO robot, may help maintain engagement and attention. However, such robots potentially have limited sound quality or processing speed.Methods: As a first step toward the use of NAO in psychophysical testing of speech- in-speech perception, we compared normal-hearing young adults’ performance when using the standard computer interface to that when using a NAO robot to introduce the test and present all corresponding stimuli. Target sentences were presented with colour and number keywords in the presence of competing masker speech at varying target-to-masker ratios. Sentences were produced by the same speaker, but voice differences between the target and masker were introduced using speech synthesis methods. To assess test performance, speech intelligibility and data collection duration were compared between the computer and NAO setups. Human-robot interaction was assessed using the Negative Attitude Toward Robot Scale (NARS) and quantification of behavioural cues (backchannels).Results: Speech intelligibility results showed functional similarity between the computer and NAO setups. Data collection durations were longer when using NAO. NARS results showed participants had a relatively positive attitude toward “situations of interactions” with robots prior to the experiment, but otherwise showed neutral attitudes toward the “social influence” of and “emotions in interaction” with robots. The presence of more positive backchannels when using NAO suggest higher engagement with the robot in comparison to the computer.Discussion: Overall, the study presents the potential of the NAO for presenting speech materials and collecting psychophysical measurements for speech-in-speech perception

    Perception of a Humanoid Robot as an Interface for Auditory Testing

    Get PDF
    International audiencePerception tasks that require long and frequent testing can result in a loss of attention and focus, contributing to erroneous and inconsistent data. To maintain engagement and improve reliability of data, a more interactive interface could be used. This study aims to investigate if a humanoid NAO robot could provide such an interface to improve or maintain engagement during testing. More specifically, human-robot interaction (HRI) will be explored when performing various auditory perception tasks, played in the form of games, on the NAO robot. Evaluation of the HRI will be performed using questionnaires derived from the system usability, Godspeed, and similarityattraction scales, as well as through video analysis. Future research will extend the evaluation of the HRI to children and aim to further improve the engagement and social acceptability of the NAO robot as a game interface

    Use of a humanoid robot for auditory psychophysical testing

    Get PDF
    46 pages (double spaced), 7 figures, 55 references, submitted to PLOS ONETasks in psychophysical tests can at times be repetitive and cause individuals to lose engagement during the test. To facilitate engagement, we propose the use of a humanoid NAO robot as an alternative interface for conducting psychophysical tests. Specifically, we aim to evaluate the performance of the NAO as an auditory testing interface, given its potential limitations and technical differences, in comparison to the current laptop interface. We examine the results and durations of two voice perception tests, voice cue sensitivity and voice gender categorisation, obtained from both the conventionally used laptop interface and the new NAO interface. Both tests investigate the perception and use of two speaker-specific voice cues, fundamental frequency (F0) and vocal tract length (VTL), important for characterising voice gender. Responses are logged on the laptop using a connected mouse, and on the NAO using the tactile sensors. Comparison of test results from both interfaces shows functional similarity between the interfaces and replicates findings from previous studies with similar tests. Comparison of test durations shows longer testing times with NAO, primarily due to longer processing times in comparison to the laptop, as well as other design limitations due to the implementation of the test on the robot. Despite the inherent constraints of the NAO robot, such as in sound quality, relatively long processing and testing times, and different methods of response logging, the NAO interface appears to facilitate collecting similar data to the current laptop interface, confirming its potential as an alternative psychophysical test interface for auditory perception tests
    corecore