9,794 research outputs found

    Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.

    Get PDF
    The study reported here uses articulatory data to investigate Korean place assimilation of coronal stops followed by labial or velar stops, both within words and across words. The results show that this place-assimilation process is highly variable, both within and across speakers, and is also sensitive to factors such as the place of articulation of the following consonant, the presence of a word boundary and, to some extent, speech rate. Gestures affected by the process are generally reduced categorically (deleted), while sporadic gradient reduction of gestures is also observed. We further compare the results for coronals to our previous findings on the assimilation of labials, discussing implications of the results for grammatical models of phonological/phonetic competence. The results suggest that speakers’ language-particular knowledge of place assimilation has to be relatively detailed and context-sensitive, and has to encode systematic regularities about its obligatory/variable application as well as categorical/gradient realisation

    The phonetics of second language learning and bilingualism

    Get PDF
    This chapter provides an overview of major theories and findings in the field of second language (L2) phonetics and phonology. Four main conceptual frameworks are discussed and compared: the Perceptual Assimilation Model-L2, the Native Language Magnet Theory, the Automatic Selection Perception Model, and the Speech Learning Model. These frameworks differ in terms of their empirical focus, including the type of learner (e.g., beginner vs. advanced) and target modality (e.g., perception vs. production), and in terms of their theoretical assumptions, such as the basic unit or window of analysis that is relevant (e.g., articulatory gestures, position-specific allophones). Despite the divergences among these theories, three recurring themes emerge from the literature reviewed. First, the learning of a target L2 structure (segment, prosodic pattern, etc.) is influenced by phonetic and/or phonological similarity to structures in the native language (L1). In particular, L1-L2 similarity exists at multiple levels and does not necessarily benefit L2 outcomes. Second, the role played by certain factors, such as acoustic phonetic similarity between close L1 and L2 sounds, changes over the course of learning, such that advanced learners may differ from novice learners with respect to the effect of a specific variable on observed L2 behavior. Third, the connection between L2 perception and production (insofar as the two are hypothesized to be linked) differs significantly from the perception-production links observed in L1 acquisition. In service of elucidating the predictive differences among these theories, this contribution discusses studies that have investigated L2 perception and/or production primarily at a segmental level. In addition to summarizing the areas in which there is broad consensus, the chapter points out a number of questions which remain a source of debate in the field today.https://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHhttps://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHhttps://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHAccepted manuscriptAccepted manuscrip

    Laryngeal stop systems in contact: connecting present-day acquisition findings and historical contact hypotheses

    Get PDF
    This article examines the linguistic forces at work in present-day second language and bilingual acquisition of laryngeal contrasts, and to what extent these can give us insight into the origin of laryngeal systems of Germanic voicing languages like Dutch, with its contrast between prevoiced and unaspirated stops. The results of present-day child and adult second language acquisition studies reveal that both imposition and borrowing may occur when the laryngeal systems of a voicing and an aspirating language come into contact with each other. A scenario is explored in which socially dominant Germanic-speaking people came into contact with a Romance-speaking population, and borrowed the Romance stop system

    Correlates of linguistic rhythm in the speech signal

    Get PDF
    Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants’ capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition

    Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

    Full text link
    Neural models have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.Comment: NIPS 201

    The weight of phonetic substance in the structure of sound inventories

    Get PDF
    In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics - namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (Vallée et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3- to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control
    corecore