11,923 research outputs found
Analyzing analytical methods: The case of phonology in neural models of spoken language
Given the fast development of analysis techniques for NLP and speech
processing systems, few systematic studies have been conducted to compare the
strengths and weaknesses of each method. As a step in this direction we study
the case of representations of phonology in neural network models of spoken
language. We use two commonly applied analytical techniques, diagnostic
classifiers and representational similarity analysis, to quantify to what
extent neural activation patterns encode phonemes and phoneme sequences. We
manipulate two factors that can affect the outcome of analysis. First, we
investigate the role of learning by comparing neural activations extracted from
trained versus randomly-initialized models. Second, we examine the temporal
scope of the activations by probing both local activations corresponding to a
few milliseconds of the speech signal, and global activations pooled over the
whole utterance. We conclude that reporting analysis results with randomly
initialized models is crucial, and that global-scope methods tend to yield more
consistent results and we recommend their use as a complement to local-scope
diagnostic methods.Comment: ACL 202
Recommended from our members
Lexical and sub-lexical knowledge influences the encoding, storage, and articulation of nonwords
Nonword repetition (NWR) has been used extensively in the study of child language. Although lexical and sub-lexical knowledge is known to influence NWR performance, there has been little examination of the NWR processes (e.g., encoding, storage, articulation) that may be affected by lexical and sub-lexical knowledge. We administered 2- and 3-syllable spoken nonword recognition and nonword repetition tests on two independent groups of 31 children (M=5;07). Spoken nonword recognition primarily involves encoding and storage, whereas NWR involves an additional articulation process. The influence of lexical and sub-lexical knowledge was determined by examining the amount of lexical errors produced. There was a clear involvement of long-term lexical and sub-lexical knowledge in both spoken nonword recognition and NWR. In spoken nonword recognition, twice as many errors involved selecting a foil that contained a lexical item (e.g., yashukup) over a foil that contained only nonsense syllables (e.g., yashunup). In repetition, over 30% of errors changed a nonsense syllable to a lexical item. Our results show that long-term lexical and sub-lexical knowledge is pervasive in NWR – any explanation of NWR performance must therefore consider the influence of lexical and sub-lexical knowledge throughout the whole repetition process, from the encoding of nonwords to the articulation of them
Segmentation ART: A Neural Network for Word Recognition from Continuous Speech
The Segmentation ATIT (Adaptive Resonance Theory) network for word recognition from a continuous speech stream is introduced. An input sequeuce represents phonemes detected at a preproccesing stage. Segmentation ATIT is trained rapidly, and uses a fast-learning fuzzy ART modules, top-down expectation, and a spatial representation of temporal order. The network performs on-line identification of word boundaries, correcting an initial hypothesis if subsequent phonemes are incompatible with a previous partition. Simulations show that the system's segmentation perfonnance is comparable to that of TRACE, and the ability to segment a number of difficult phrases is also demonstrated.National Science Foundation (NSF-IRI-94-01659); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0G57
Speech vocoding for laboratory phonology
Using phonological speech vocoding, we propose a platform for exploring
relations between phonology and speech processing, and in broader terms, for
exploring relations between the abstract and physical structures of a speech
signal. Our goal is to make a step towards bridging phonology and speech
processing and to contribute to the program of Laboratory Phonology. We show
three application examples for laboratory phonology: compositional phonological
speech modelling, a comparison of phonological systems and an experimental
phonological parametric text-to-speech (TTS) system. The featural
representations of the following three phonological systems are considered in
this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English
(SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded
speech, we conclude that the latter achieves slightly better results than the
former. However, GP - the most compact phonological speech representation -
performs comparably to the systems with a higher number of phonological
features. The parametric TTS based on phonological speech representation, and
trained from an unlabelled audiobook in an unsupervised manner, achieves
intelligibility of 85% of the state-of-the-art parametric speech synthesis. We
envision that the presented approach paves the way for researchers in both
fields to form meaningful hypotheses that are explicitly testable using the
concepts developed and exemplified in this paper. On the one hand, laboratory
phonologists might test the applied concepts of their theoretical models, and
on the other hand, the speech processing community may utilize the concepts
developed for the theoretical phonological models for improvements of the
current state-of-the-art applications
- …