Search CORE

11,923 research outputs found

Analyzing analytical methods: The case of phonology in neural models of spoken language

Author: Alishahi Afra
Chrupała Grzegorz
Higy Bertrand
Publication venue
Publication date: 01/01/2020
Field of study

Given the fast development of analysis techniques for NLP and speech processing systems, few systematic studies have been conducted to compare the strengths and weaknesses of each method. As a step in this direction we study the case of representations of phonology in neural network models of spoken language. We use two commonly applied analytical techniques, diagnostic classifiers and representational similarity analysis, to quantify to what extent neural activation patterns encode phonemes and phoneme sequences. We manipulate two factors that can affect the outcome of analysis. First, we investigate the role of learning by comparing neural activations extracted from trained versus randomly-initialized models. Second, we examine the temporal scope of the activations by probing both local activations corresponding to a few milliseconds of the speech signal, and global activations pooled over the whole utterance. We conclude that reporting analysis results with randomly initialized models is crucial, and that global-scope methods tend to yield more consistent results and we recommend their use as a complement to local-scope diagnostic methods.Comment: ACL 202

arXiv.org e-Print Archive

Crossref

Tilburg University Repository

Recommended from our members

Lexical and sub-lexical knowledge influences the encoding, storage, and articulation of nonwords

Author: Jones G
Witherstone HL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Nonword repetition (NWR) has been used extensively in the study of child language. Although lexical and sub-lexical knowledge is known to influence NWR performance, there has been little examination of the NWR processes (e.g., encoding, storage, articulation) that may be affected by lexical and sub-lexical knowledge. We administered 2- and 3-syllable spoken nonword recognition and nonword repetition tests on two independent groups of 31 children (M=5;07). Spoken nonword recognition primarily involves encoding and storage, whereas NWR involves an additional articulation process. The influence of lexical and sub-lexical knowledge was determined by examining the amount of lexical errors produced. There was a clear involvement of long-term lexical and sub-lexical knowledge in both spoken nonword recognition and NWR. In spoken nonword recognition, twice as many errors involved selecting a foil that contained a lexical item (e.g., yashukup) over a foil that contained only nonsense syllables (e.g., yashunup). In repetition, over 30% of errors changed a nonsense syllable to a lexical item. Our results show that long-term lexical and sub-lexical knowledge is pervasive in NWR – any explanation of NWR performance must therefore consider the influence of lexical and sub-lexical knowledge throughout the whole repetition process, from the encoding of nonwords to the articulation of them

Nottingham Trent Institutional Repository (IRep)

Segmentation ART: A Neural Network for Word Recognition from Continuous Speech

Author: Carpenter Gail A.
Wilson Frank D. M.
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/05/1998
Field of study

The Segmentation ATIT (Adaptive Resonance Theory) network for word recognition from a continuous speech stream is introduced. An input sequeuce represents phonemes detected at a preproccesing stage. Segmentation ATIT is trained rapidly, and uses a fast-learning fuzzy ART modules, top-down expectation, and a spatial representation of temporal order. The network performs on-line identification of word boundaries, correcting an initial hypothesis if subsequent phonemes are incompatible with a previous partition. Simulations show that the system's segmentation perfonnance is comparable to that of TRACE, and the ability to segment a number of difficult phrases is also demonstrated.National Science Foundation (NSF-IRI-94-01659); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0G57

Boston University Institutional Repository (OpenBU)

Speech vocoding for laboratory phonology

Author: Benus Stefan
Cernak Milos
Lazaridis Alexandros
Publication venue: 'Elsevier BV'
Publication date: 19/05/2015
Field of study

Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP - the most compact phonological speech representation - performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis. We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne