Search CORE

9,537 research outputs found

Unfamiliar voice identification: effect of post-event information on accuracy and voice ratings

Author: Baguley
Baguley
Belin
Belli
Bricker
Campos
Clifford
Deffenbacher
Frost
Hammersley
Handkins
Hollien
Howard
Johnson
Kerstholt
Kirk
Legge
Linville
Loftus
Loftus
Loftus
Loftus
Luna
McAllister
McCloskey
Meissner
Mullennix
Neath
Olsson
Orchard
Pezdek
Philippon
Pigott
Pinheiro
Pisoni
Posner
Read
Saslove
Schooler
Searcy
Stevenage
Tomes
Van Wallendael
Weingardt
Wilding
Yarmey
Yarmey
Yarmey
Yarmey
Yarmey
Yarmey
Zaragoza
Zaragoza
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/01/2014
Field of study

This study addressed the effect of misleading post-event information (PEI) on voice ratings, identification accuracy, and confidence, as well as the link between verbal recall and accuracy. Participants listened to a dialogue between male and female targets, then read misleading information about voice pitch. Participants engaged in verbal recall, rated voices on a feature checklist, and made a lineup decision. Accuracy rates were low, especially on target-absent lineups. Confidence and accuracy were unrelated, but the number of facts recalled about the voice predicted later lineup accuracy. There was a main effect of misinformation on ratings of target voice pitch, but there was no effect on identification accuracy or confidence ratings. As voice lineup evidence from earwitnesses is used in courts, the findings have potential applied relevance

Crossref

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals

Bio-inspired broad-class phonetic labelling

Author: Fernández L.M.
Ferrández Vicente José Manuel
Gómez Vilda Pedro
Martínez Olalla Rafael
Muñoz Cristina
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2008
Field of study

Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM).Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is based in the automatic detection of formants and formant trajectories after a careful separation of the vocal and glottal components of speech and in the operation of CF (Characteristic Frequency) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus. Examples of phonetic class labeling are given and the applicability of the method to Speech Processing is discussed

Archivo Digital UPM

BSA Practice guidance: an overview of current management of auditory processing disorder (APD)

Author: Alles R.
Bamiou D.
Batchelor L.
Campbell N.G. (Lead Author)
Canning D.
Grant P.
Luxon L.
Moore D.
Murray P.
Nairn S.
Rosen S.
Sirimanna T.
Treharne D.
Wakeham K.
Publication venue: British Society of Audiology
Publication date: 17/10/2011
Field of study

Southampton (e-Prints Soton)

An application of an auditory periphery model in speaker identification

Author: Islam Md. Atiqul
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2021
Field of study

The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

Western Sydney ResearchDirect

Language attitudes revisited: auditory affective priming

Author: Geeraerts Dirk
Impe Leen
Speelman Dirk
Spruyt Adriaan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

A binaural grouping model for predicting speech intelligibility in multitalker environments

Author: Colburn H. Steven
Mi Jing
Publication venue: 'SAGE Publications'
Publication date: 30/05/2018
Field of study

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH

Boston University Institutional Repository (OpenBU)

Learning sound representations using trainable COPE feature extractors

Author: Petkov Nicolai
Strisciuglio Nicola
Vento Mario
Publication venue: 'Elsevier BV'
Publication date: 22/03/2019
Field of study

Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Communications Biophysics

Author: Boduch Raymond
Braida Louis D.
Bristol Norman, Jr.
Bustamante Diane K.
Coker Jackie
Delhorne Lorraine A.
DeRosier David J.
Dowdy Leonard C.
Durlach Nathaniel I.
Farrar Catherine L.
Florentine Mary S.
Foss Kristin K.
Freeman Dennis M.
Frishkopf Lawrence S.
Hildebrandt Eric M.
Ito Yoshiko
Kiang Nelson Y-S.
MacMillan Neil A.
Milner Paul
Oman Charles M.
Peake William T.
Peterson Patrick M.
Posen Miles P.
Rabinowitz William M.
Reed Charlotte M.
Rohlicek Robin
Russell Roy P., Jr.
Schultz Martin C.
Siebert William M.
Skarda Gregory M.
Uchanski Rosalie M.
Villchur Edgar
Weiss Thomas F.
Zue Victor W.
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1983
Field of study

Contains research objectives and reports on six research projects split into three sections.National Institutes of Health (Grant 5 P01 NS13126-07)National Institutes of Health (Training Grant 5 T32 NS07047-05)National Institutes of Health (Training Grant 2 T32 NS07047-06)National Science Foundation (Grant BNS 77-16861)National Institutes of Health (Grant 5 R01 NS1284606)National Institutes of Health (Grant 5 T32 NS07099)National Science Foundation (Grant BNS77-21751)National Institutes of Health (Grant 5 R01 NS14092-04)Gallaudet College SubcontractKarmazin Foundation through the Council for the Arts at M.I.T.National Institutes of Health (Grant 1 R01 NS1691701A1)National Institutes of Health (Grant 5 R01 NS11080-06)National Institutes of Health (Grant GM-21189

DSpace@MIT

Spectral discontinuity in concatenative speech synthesis – perception, join costs and feature transformations

Author: Kirkpatrick Barry
Publication venue: Dublin City University. School of Computing
Publication date: 22/02/2010
Field of study

This thesis explores the problem of determining an objective measure to represent human perception of spectral discontinuity in concatenative speech synthesis. Such measures are used as join costs to quantify the compatibility of speech units for concatenation in unit selection synthesis. No previous study has reported a spectral measure that satisfactorily correlates with human perception of discontinuity. An analysis of the limitations of existing measures and our understanding of the human auditory system were used to guide the strategies adopted to advance a solution to this problem. A listening experiment was conducted using a database of concatenated speech with results indicating the perceived continuity of each concatenation. The results of this experiment were used to correlate proposed measures of spectral continuity with the perceptual results. A number of standard speech parametrisations and distance measures were tested as measures of spectral continuity and analysed to identify their limitations. Time-frequency resolution was found to limit the performance of standard speech parametrisations.As a solution to this problem, measures of continuity based on the wavelet transform were proposed and tested, as wavelets offer superior time-frequency resolution to standard spectral measures. A further limitation of standard speech parametrisations is that they are typically computed from the magnitude spectrum. However, the auditory system combines information relating to the magnitude spectrum, phase spectrum and spectral dynamics. The potential of phase and spectral dynamics as measures of spectral continuity were investigated. One widely adopted approach to detecting discontinuities is to compute the Euclidean distance between feature vectors about the join in concatenated speech. The detection of an auditory event, such as the detection of a discontinuity, involves processing high up the auditory pathway in the central auditory system. The basic Euclidean distance cannot model such behaviour. A study was conducted to investigate feature transformations with sufficient processing complexity to mimic high level auditory processing. Neural networks and principal component analysis were investigated as feature transformations. Wavelet based measures were found to outperform all measures of continuity based on standard speech parametrisations. Phase and spectral dynamics based measures were found to correlate with human perception of discontinuity in the test database, although neither measure was found to contribute a significant increase in performance when combined with standard measures of continuity. Neural network feature transformations were found to significantly outperform all other measures tested in this study, producing correlations with perceptual results in excess of 90%

Irish Universities

DCU Online Research Access Service