Search CORE

39 research outputs found

Dataset of British English speech recordings for psychoacoustics and speech processing research: The clarity speech corpus

Author: Akeroyd MA
Barker J
Cox TJ
Culling JF
Graetzer SN
Muñoz RV
Naylor G
Porter E
Publication venue: 'Elsevier BV'
Publication date: 15/02/2022
Field of study

This paper presents the Clarity Speech Corpus, a publicly available, forty speaker British English speech dataset. The corpus was created for the purpose of running listening tests to gauge speech intelligibility and quality in the Clarity Project, which has the goal of advancing speech signal processing by hearing aids through a series of challenges. The dataset is suitable for machine learning and other uses in speech and hearing technology, acoustics and psychoacoustics. The data comprises recordings of approximately 10,000 sentences drawn from the British National Corpus (BNC) with suitable length, words and grammatical construction for speech intelligibility testing. The collection process involved the selection of a subset of BNC sentences, the recording of these produced by 40 British English speakers, and the processing of these recordings to create individual sentence recordings with associated transcripts and metadata

University of Salford Institutional Repository

Repository@Nottingham

Online Research @ Cardiff

PubMed Central

White Rose Research Online

Clarity-2021 challenges : machine learning challenges for advancing hearing aid processing

Author: Akeroyd M
Barker J
Cox TJ
Culling JF
Graetzer SN
Naylor G
Porter E
Viveros Munoz R
Publication venue: 'International Speech Communication Association'
Publication date: 03/09/2021
Field of study

In recent years, rapid advances in speech technology have been made possible by machine learning challenges such as CHiME, REVERB, Blizzard, and Hurricane. In the Clarity project, the machine learning approach is applied to the problem of hearing aid processing of speech-in-noise, where current technology in enhancing the speech signal for the hearing aid wearer is often ineffective. The scenario is a (simulated) cuboid-shaped living room in which there is a single listener, a single target speaker and a single interferer, which is either a competing talker or domestic noise. All sources are static, the target is always within ±30◦ azimuth of the listener and at the same elevation, and the interferer is an omnidirectional point source at the same elevation. The target speech comes from an open source 40- speaker British English speech database collected for this purpose. This paper provides a baseline description of the round one Clarity challenges for both enhancement (CEC1) and prediction (CPC1). To the authors’ knowledge, these are the first machine learning challenges to consider the problem of hearing aid speech signal processin

University of Salford Institutional Repository

Neural Segregation of Concurrent Speech: Effects of Background Noise and Reverberation on Auditory Scene Analysis in the Ventral Cochlear Nucleus

Author: AK Nabelek
AR Palmer
B Delgutte
BJ May
E Larsen
JF Culling
JF Culling
JPL Brokx
KL Payton
M Sayles
M Sayles
MC Slama
MK Qin
N Mesgarani
PF Assmann
Philip X. Joris
SE Keilson
SF Poissant
WC Sabine
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Combination of Spectral and Binaurally Created Harmonics in a Common Central Pitch Processor

Author: AJM Houtsma
AR Palmer
AR Palmer
AS Bregman
BCJ Moore
BR Schofield
Christopher J. Plack
CJ Darwin
CR Montgomery
D Bendor
DA Hall
E Terhardt
EM Cramer
FA Bilsen
G Studebaker
GF Smoorenburg
H Gockel
HE Gockel
Hedwig E. Gockel
J Raatgever
JF Culling
JL Goldstein
L Wiegrebe
L Wiegrebe
MA Akeroyd
MA Akeroyd
MA Akeroyd
NJ Ingham
P Schneider
P Schneider
PX Zhang
R Meddis
R Meddis
Robert P. Carlyon
RP Carlyon
WM Hartmann
Publication venue: Springer-Verlag
Publication date: 01/04/2011
Field of study

A fundamental attribute of human hearing is the ability to extract a residue pitch from harmonic complex sounds such as those produced by musical instruments and the human voice. However, the neural mechanisms that underlie this processing are unclear, as are the locations of these mechanisms in the auditory pathway. The ability to extract a residue pitch corresponding to the fundamental frequency from individual harmonics, even when the fundamental component is absent, has been demonstrated separately for conventional pitches and for Huggins pitch (HP), a stimulus without monaural pitch information. HP is created by presenting the same wideband noise to both ears, except for a narrowband frequency region where the noise is decorrelated across the two ears. The present study investigated whether residue pitch can be derived by combining a component derived solely from binaural interaction (HP) with a spectral component for which no binaural processing is required. Fifteen listeners indicated which of two sequentially presented sounds was higher in pitch. Each sound consisted of two “harmonics,” which independently could be either a spectral or a HP component. Component frequencies were chosen such that the relative pitch judgement revealed whether a residue pitch was heard or not. The results showed that listeners were equally likely to perceive a residue pitch when one component was dichotic and the other was spectral as when the components were both spectral or both dichotic. This suggests that there exists a single mechanism for the derivation of residue pitch from binaurally created components and from spectral components, and that this mechanism operates at or after the level of the dorsal nucleus of the lateral lemniscus (brainstem) or the inferior colliculus (midbrain), which receive inputs from the medial superior olive where temporal information from the two ears is first combined

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Lancaster E-Prints

Spike-Timing-Based Computation in Sound Localization

Author: A Brand
BJ Hefti
BR Glasberg
C Faller
C Huetz
C Lorenzi
C Lorenzi
D Wang
Dan F. M. Goodman
DFM Goodman
GF Kuhn
H Asari
H Wagner
HS Colburn
J Blauert
J Breebaart
J Liu
JA Macdonald
JC Makous
JC Middlebrooks
JC Middlebrooks
JF Culling
Karl J. Friston
LA Jeffress
LO Trussell
LO Trussell
M Slaney
M Wehr
MC Reed
MM Van Wanrooij
MSA Zilany
N Fourcaud-Trocme
N Roman
NB Cant
NI Durlach
NS Harper
P Joris
P Zahorik
PM Hofman
PX Joris
PX Joris
PX Joris
PX Joris
R Brette
R Jolivet
RM Stern
Romain Brette
RY Litovsky
S Furukawa
S Huggenberger
SJ Sterbing
SK Thompson
T Hromádka
TC Yin
VR Algazi
W Gaik
W Lindemann
WE Kock
Y Zhou
Publication venue: Public Library of Science
Publication date: 08/10/2010
Field of study

Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Behavioral Sensitivity to Broadband Binaural Localization Cues in the Ferret

Author: A Alves-Pinto
A Moiseff
AA Dreyer
AD Bala
AD Musicant
AJ King
AJ King
AL Smith
Andreas L. Schulz
Andrew J. King
AW Mills
BA Wright
BC Skottun
BH Scott
BJ Fischer
CH Parsons
CS Ebert Jr
D Kumpik
D Rowan
DE Hartley
DE Hartley
DM Green
DP Kumpik
DR Moore
EA Macpherson
EA Macpherson
Fernando R. Nodal
FR Nodal
FR Nodal
GH Recanzone
GL Kavanagh
IB Witten
J Blauert
J Zwislocki
JB Kelly
JB Kelly
JC Dahmen
JE Hine
JF Culling
JJ Tsai
JK Bizley
JK Bizley
JL Peña
JW Schnupp
K Saberi
KM Walker
Kohilan Gananandan
MA Bee
N Roman
O Kacelnik
OS Wakeford
P Yin
Peter Keating
RA Campbell
RG Klumpp
S Carlile
S Carlile
S Carlile
SE Egnor
SJ Sterbing
WA Yost
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Central auditory masking by an illusory tone

Author: A Alves-Pinto
AJ Oxenham
AM Small
Andrew J. Oxenham
BC Backus
C Lorenzi
Christopher J. Plack
CJ Plack
CJ Plack
DA Hall
DL Neff
E Buss
EM Cramer
EM Relkin
GA Miller
H Duifhuis
H Levitt
HE Gockel
Heather A. Kreft
HH Lim
HS Colburn
HS Colburn
I Hertrich
IJ Hirsh
J Zheng
JF Culling
Jun Yan
LA Jeffress
LL Elliott
M Chait
MF Yama
MJ Shailer
NI Durlach
NI Durlach
NP Cooper
PC Nelson
Robert P. Carlyon
RP Carlyon
RP Carlyon
S Bleeck
S Puschmann
SR Otto
TE Hanna
WA Yost
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Many natural sounds fluctuate over time. The detectability of sounds in a sequence can be reduced by prior stimulation in a process known as forward masking. Forward masking is thought to reflect neural adaptation or neural persistence in the auditory nervous system, but it has been unclear where in the auditory pathway this processing occurs. To address this issue, the present study used a "Huggins pitch" stimulus, the perceptual effects of which depend on central auditory processing. Huggins pitch is an illusory tonal sensation produced when the same noise is presented to the two ears except for a narrow frequency band that is different (decorrelated) between the ears. The pitch sensation depends on the combination of the inputs to the two ears, a process that first occurs at the level of the superior olivary complex in the brainstem. Here it is shown that a Huggins pitch stimulus produces more forward masking in the frequency region of the decorrelation than a noise stimulus identical to the Huggins-pitch stimulus except with perfect correlation between the ears. This stimulus has a peripheral neural representation that is identical to that of the Huggins-pitch stimulus. The results show that processing in, or central to, the superior olivary complex can contribute to forward masking in human listeners

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Lancaster E-Prints