69 research outputs found
Entropy coding for training deep belief networks with imbalanced and unlabeled data
Session 1aSCb - Speech Communication: Speech Processing Potpourri (Poster Session): no. 1aSCb1Training deep belief networks (DBNs) is normally done with large data sets. In this work, the goal is to predict traces of the surface of the tongue in ultrasoundimages of the mouth during speech. Performance on this task can be dramatically enhanced by pre-training a DBN jointly on human-supplied traces and ultrasoundimages, then training a modified version of the network to predict traces from ultrasound only. However, hand-tracing the entire dataset of ultrasoundimages is extremely labor intensive. Moreover, the dataset is highly imbalanced since many images are extremely similar. This work presents a bootstrapping method which takes advantage of this imbalance, iteratively selecting a small subset of images to be hand-traced, then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection. With this approach, a three-fold reduction in human time required to trace an entire dataset with human-level accuracy was achieved.published_or_final_versio
Emergent phonological representations: no need for autosegmental architecture
This paper examines implications for autosegmental representations of a model that minimizes the role of an innate linguistic endowment in grammar formation. If the innate linguistic endowment is minimized, language learning is from the bottom up and cannot rely on universal structures. Bottom-up grammars share common goals with top-down grammars, among them to identify and characterize phonological patterns. In this paper, we examine vowel distribution in Tiv, a Niger-Congo language of Nigeria. The six Tiv vowels occur in restricted positions in verbs: only 10 of the 36 possible V1(C)V2 sequences occur with any frequency. Tiv vowel distribution has been explained in terms of feature geometry, association rules, and spreading rules. We show that while the vowel distribution can be expressed using such an architecture, it can also be expressed in simpler terms, relying only on nonlinguistic capabilities such as the ability to evaluate input based on similarity and frequency, and the ability to construct symbolic representations of such data. In addition to conceptual arguments in favor of a bottom-up, emergent phonology, the paper provides an example of the analysis of a phonological system under the Emergence hypothesis.postprin
The articulation of lexical palatalization in Scottish Gaelic
Session 4aSC - Speech Communication: Cross-Language Topics in Speech Communication (Poster Session) - Contributed Paper: 4aSC6Scottish Gaelic (Gà idhlig, henceforth SG) exhibits a rich system of consonant mutation, which is mostly governed by its morphology (Ladefoged et al. 1998; Gillies 2002; Stewart 2004). For instance, bà ta “boat” changes to [v] when the word undergoes morphological inflection—e.g., a bhà ta “his boat”, in which the sound spelled bh is pronounced as [v]. Using ultrasound imaging, the present study investigates palatalization in SG, which is considered as one of lexicalized consonant mutation types. Experimental data was collected in Sabhal Mòr Ostaig, a college on the Isle of Skye. Preliminary results show a clear sign of palatalization across different consonant types in palatalization environments (i.e., when morphologically conditioned), represented by higher tongue contours in the front region of tongue. While the articulatory distinction between plain and palatalized consonants is significant, different syllabic positions (i.e., word-initial vs. -final palatalization) often yield individualized patterns.published_or_final_versio
Comparing phoneme frequency, age of acquisition, and loss in aphasia:Implications for phonological universals
Phonological complexity may be central to the nature of human language. It may shape the distribution of phonemes and phoneme sequences within languages, but also determine age of acquisition and susceptibility to loss in aphasia. We evaluated this claim using frequency statistics derived from a corpus of phonologically transcribed Italian words (phonitalia, available at phonitalia,org), rankings of phoneme age of acquisition (AoA) and rate of phoneme errors in patients with apraxia of speech (AoS) as an indication of articulatory complexity. These measures were related to cross-linguistically derived markedness rankings. We found strong correspondences. AoA, however, was predicted by both apraxic errors and frequency, suggesting independent contributions of these variables. Our results support the reality of universal principles of complexity. In addition they suggest that these complexity principles have articulatory underpinnings since they modulate the production of patients with AoS, but not the production of patients with more central phonological difficulties
Syllabification and prosodic templates in Yawelmani
This article addresses the interaction of syllabification and templatic morphology in Yawelmani. The morphological templates (in CV terms, CVCC, CVVCC, and CVCVVC) do not parse directly into well-formed Yawelmani surface syllables (CV, CVV, CVC). Nonetheless, as argued here, these templates can be expressed in terms of legitimate prosodic units, thereby supporting the prosodic morphology hypothesis (McCarthy and Prince 1986, 1987, 1990). The basic idea is that segments map from left to right to the template, but if a template is too small, any leftover stem consonants simply undergo right to left syllabification. This analysis accounts for the general templatic mapping of verbs and nouns as well as the different kinds of reduplication in Yawelmani. It also provides a more explanatory account of the 'ghost' consonants - initial consonants of some of the suffixes which surface only when the stem is biconsonantal, but not if the stem is larger. The analysis not only provides support for the prosodic morphology hypothesis, it also argues in favor of a templatic view of syllabification (Itô 1986, 1989) and a rule of Weight-by-Position (Hayes 1989) operating independently of the general syllabification process. © 1991 Kluwer Academic Publishers.link_to_subscribed_fulltex
The root CV-template as a property of the affix: Evidence from Yawelmani
In this article, I have provided support for a skeletal core independent of any phonemic material. This is not a new theoretical claim, but rather adds to a small but growing literature (McCarthy 1979, 1981, Halle and Vergnaud 1980, Harris 1980, Marantz 1982, Yip 1982). However, the analysis here is an important addition because the skeleta are added to the grammer in an unfamiliar manner: affixes may determine the skeletal template of a root; if not, a default template is supplied, determined by a lexical diacritic on each verb root. Interestingly, recent work on Norwegian tone by Withgott and Halvorsen (in prep) suggests that when a suffix bears tone in Norwegian, the suffixal tone pattern surfaces on the word. With no affixes or with a toneless suffix, the underlying (or default) tone of the word surfaces. This parallels in tone the example in templates that Yokuts provides. In section 3, a CV-template pool consisting of the three default templates of verbs in Yawelmani was established. Certain affixes supply templates from this pool, and the phonemic melody of the root associates with the selected template according to universal conventions and the rule of V Spread (43). The assumption of a pool containing only three templates accounts for the pairing of bi- and triconsonantal forms when a template is selected by an affix, that is the CVC-CVCC, CVVC-CVVCC, and CVCVV-CVCVVC pairings. The triconsonantal template is selected in all cases. With biconsonantal roots, the third C-slot of the template has no segment associated with it, and so cannot surface. This explanation is elegant and concise, but is not available without the existence of an independent skeletal tier. © 1983 D. Reidel Publishing Company.link_to_subscribed_fulltex
- …