2,828 research outputs found
Encoding of phonology in a recurrent neural model of grounded speech
We study the representation and encoding of phonemes in a recurrent neural
network model of grounded speech. We use a model which processes images and
their spoken descriptions, and projects the visual and auditory representations
into the same semantic space. We perform a number of analyses on how
information about individual phonemes is encoded in the MFCC features extracted
from the speech signal, and the activations of the layers of the model. Via
experiments with phoneme decoding and phoneme discrimination we show that
phoneme representations are most salient in the lower layers of the model,
where low-level signals are processed at a fine-grained level, although a large
amount of phonological information is retain at the top recurrent layer. We
further find out that the attention mechanism following the top recurrent layer
significantly attenuates encoding of phonology and makes the utterance
embeddings much more invariant to synonymy. Moreover, a hierarchical clustering
of phoneme representations learned by the network shows an organizational
structure of phonemes similar to those proposed in linguistics.Comment: Accepted at CoNLL 201
The effect of literacy in the speech temporal modulation structure
The temporal modulation structure of adult-directed speech is conceptualised as a modulation hierarchy comprising four temporal bands, delta, 1 â 3 Hz, theta, 4 â 8 Hz, beta, 15 â 30 Hz, and low gamma, 30 â 50 Hz. Neuronal oscillatory entrainment to amplitude modulations (AMs) in these four bands may provide a basis for speech encoding and parsing the continuous signal into linguistic units (delta â syllable stress patterns, theta â syllables, beta â onset-rime units, low gamma â phonetic information). While adult-directed speech is theta-dominant and shows tighter theta-beta/low gamma phase alignment, infant-directed speech is delta-dominant and shows tighter delta-theta phase alignment. Although this change in the speech representations could be maturational, it was hypothesized that literacy may also influence the structure of speech. In fact, literacy and schooling are known to change auditory speech entrainment, enhancing phonemic specification and augmenting the phonological detail of the lexiconâs representations. Thus, we hypothesized that a corresponding difference in speech production could also emerge. In this work, spontaneous speech samples were recorded from literate (with lower and higher literacy) and illiterate subjects and their energy modulation spectrum across delta, theta and beta/low gamma AMs as well as the phase synchronization between nested AMs analysed. Measures of the participantsâ phonology skills and vocabulary were also retrieved and a specific task to confirm the sensitivity to speech rhythm of the analysis method used (S-AMPH) was conducted. Results showed no differences in the energy of delta, theta and beta/low gamma AMs in spontaneous speech. However, phase alignment between slower and faster speech AMs was significantly enhanced by literacy, showing moderately strong correlations with the phonology measures and literacy. Our data suggest that literacy affects not only cortical entrainment and speech perception but also the physical/rhythmic properties of speech production.A modulação temporal do discurso dirigido a adultos Ă© conceptualizado como uma hierarquia de modulaçÔes em quatro bandas temporais: delta, 1 â 3 Hz, theta, 4 â 8 Hz, beta, 15 â 30 Hz, e low gamma, 30 â 50 Hz. A sincronização das oscilaçÔes neuronais nestas quatro bandas pode providenciar a base para a codificação e anĂĄlise de um sinal contĂnuo em unidades linguĂsticas (delta â força silĂĄbica, theta â sĂlabas, beta â arranque/rima, low gamma â informação fonĂ©tica). Enquanto o discurso dirigido a adultos Ă© de um ritmo predominantemente theta e mostra um forte alinhamento entre bandas theta e beta/low gamma, discurso dirigido a crianças Ă© predominantemente de um ritmo delta e mostra maiores sincronizaçÔes entre bandas delta e theta. Apesar das diferenças nas representaçÔes do discurso poderem resultar de processos maturacionais, foi hipotetizado que a literacia tambĂ©m poderia influenciar as caracterĂsticas rĂtmicas do discurso. De facto, a literacia afecta o processamento auditivo da linguagem, alĂ©m de desenvolver a consciĂȘncia fonĂ©mica e aumentar o detalhe fonolĂłgico das representaçÔes lexicais. Neste estudo foram gravadas amostras de discurso espontĂąneo de sujeitos letrados (alta e baixa escolarização) e iletrados. Os espectros de modulação de energia nas bandas de interesse foram analisados bem como a sincronização das bandas delta-theta e theta-beta/ low gamma. Foram recolhidas medidas de consciĂȘncia fonolĂłgica e vocabulĂĄrio e foi realizada tambĂ©m uma tarefa para confirmar a sensibilidade do modelo de anĂĄlise (S-AMPH) ao ritmo do discurso. A anĂĄlise revelou ausĂȘncia de diferenças na energia nas modulaçÔes delta, theta ou beta/low gamma no discurso espontĂąneo. Contudo, a sincronização entre as bandas aumentou significativamente com a literacia, revelando uma correlação moderada com as medidas de fonologia, vocabulĂĄrio e literacia. Sendo assim, a literacia afecta nĂŁo sĂł a sincronização cortical e Ă linguagem falada mas tambĂ©m as propriedades fĂsicas e rĂtmicas da produção do discurso
A Deep Generative Model of Vowel Formant Typology
What makes some types of languages more probable than others? For instance,
we know that almost all spoken languages contain the vowel phoneme /i/; why
should that be? The field of linguistic typology seeks to answer these
questions and, thereby, divine the mechanisms that underlie human language. In
our work, we tackle the problem of vowel system typology, i.e., we propose a
generative probability model of which vowels a language contains. In contrast
to previous work, we work directly with the acoustic information -- the first
two formant values -- rather than modeling discrete sets of phonemic symbols
(IPA). We develop a novel generative probability model and report results based
on a corpus of 233 languages.Comment: NAACL 201
Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation
We investigate whether infant-directed speech (IDS) could facilitate word
form learning when compared to adult-directed speech (ADS). To study this, we
examine the distribution of word forms at two levels, acoustic and
phonological, using a large database of spontaneous speech in Japanese. At the
acoustic level we show that, as has been documented before for phonemes, the
realizations of words are more variable and less discriminable in IDS than in
ADS. At the phonological level, we find an effect in the opposite direction:
the IDS lexicon contains more distinctive words (such as onomatopoeias) than
the ADS counterpart. Combining the acoustic and phonological metrics together
in a global discriminability score reveals that the bigger separation of
lexical categories in the phonological space does not compensate for the
opposite effect observed at the acoustic level. As a result, IDS word forms are
still globally less discriminable than ADS word forms, even though the effect
is numerically small. We discuss the implication of these findings for the view
that the functional role of IDS is to improve language learnability.Comment: Draf
Brain Network Connectivity During Language Comprehension: Interacting Linguistic and Perceptual Subsystems.
The dynamic neural processes underlying spoken language comprehension require the real-time integration of general perceptual and specialized linguistic information. We recorded combined electro- and magnetoencephalographic measurements of participants listening to spoken words varying in perceptual and linguistic complexity. Combinatorial linguistic complexity processing was consistently localized to left perisylvian cortices, whereas competition-based perceptual complexity triggered distributed activity over both hemispheres. Functional connectivity showed that linguistically complex words engaged a distributed network of oscillations in the gamma band (20-60 Hz), which only partially overlapped with the network supporting perceptual analysis. Both processes enhanced cross-talk between left temporal regions and bilateral pars orbitalis (BA47). The left-lateralized synchrony between temporal regions and pars opercularis (BA44) was specific to the linguistically complex words, suggesting a specific role of left frontotemporal cross-cortical interactions in morphosyntactic computations. Synchronizations in oscillatory dynamics reveal the transient coupling of functional networks that support specific computational processes in language comprehension.This work was supported by an EPSRC grant to W.M.-W. (EP/F030061/1), an ERC Advanced Grant (Neurolex) to W.M.-W., and by MRC Cognition and Brain Sciences Unit (CBU) funding to
W.M.-W. (U.1055.04.002.00001.01). Computing resources were provided by the MRC-CBU. Funding to pay the Open Access publication charges for this article was provided by the Advanced Investigator Grant (Neurolex) to W.D.M.-W.This is the final published version which appears at http://dx.doi.org/10.1093/cercor/bhu28
A neural oscillations perspective on phonological development and phonological processing in developmental dyslexia
Childrenâs ability to reflect upon and manipulate the sounds in words (âphonological awarenessâ) develops as part of natural language acquisition, supports reading acquisition, and develops further as reading and spelling are learned. Children with developmental dyslexia typically have impairments in phonological awareness. Many developmental factors contribute to individual differences in phonological development. One important source of individual differences may be the childâs sensory/neural processing of the speech signal from an amplitude modulation (~ energy or intensity variation) perspective, which may affect the quality of the sensory/neural representations (âphonological representationsâ) that support phonological awareness. During speech encoding, brain electrical rhythms (oscillations, rhythmic variations in neural excitability) re-calibrate their temporal activity to be in time with rhythmic energy variations in the speech signal. The accuracy of this neural alignment or âentrainmentâ process is related to speech intelligibility. Recent neural studies demonstrate atypical oscillatory function at slower rates in children with developmental dyslexia. Potential relations with the development of phonological awareness by children with dyslexia are discussed.Medical Research Council, G0400574 and G090237
Automatic recognition of schwa variants in spontaneous Hungarian speech
This paper analyzes the nature of the process involved in optional vowel reduction in Hungarian, and the acoustic structure of schwa variants in spontaneous speech. The study focuses on the acoustic patterns of both the basic realizations of Hungarian vowels and their realizations as neutral vowels (schwas), as well as on the design, implementation, and evaluation of a set of algorithms for the recognition of both types of realizations from the speech waveform. The authors address the question whether schwas form a unified group of vowels or they show some dependence on the originally intended articulation of the vowel they stand for. The acoustic study uses a database consisting of over 4,000 utterances extracted from continuous speech, and recorded from 19 speakers. The authors propose methods for the recognition of neutral vowels depending on the various vowels they replace in spontaneous speech. Mel-Frequency Cepstral Coefficients are calculated and used for the training of Hidden Markov Models. The recognition system was trained on 2,500 utterances and then tested on 1,500 utterances. The results show that a neutral vowel can be detected in 72% of all occurrences. Stressed and unstressed syllables can be distinguished in 92% of all cases. Neutralized vowels do not form a unified group of phoneme realizations. The pronunciation of schwa heavily depends on the original articulation configuration of the intended vowel
- âŠ