56 research outputs found

    Entropy coding for training deep belief networks with imbalanced and unlabeled data

    Get PDF
    Session 1aSCb - Speech Communication: Speech Processing Potpourri (Poster Session): no. 1aSCb1Training deep belief networks (DBNs) is normally done with large data sets. In this work, the goal is to predict traces of the surface of the tongue in ultrasoundimages of the mouth during speech. Performance on this task can be dramatically enhanced by pre-training a DBN jointly on human-supplied traces and ultrasoundimages, then training a modified version of the network to predict traces from ultrasound only. However, hand-tracing the entire dataset of ultrasoundimages is extremely labor intensive. Moreover, the dataset is highly imbalanced since many images are extremely similar. This work presents a bootstrapping method which takes advantage of this imbalance, iteratively selecting a small subset of images to be hand-traced, then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection. With this approach, a three-fold reduction in human time required to trace an entire dataset with human-level accuracy was achieved.published_or_final_versio

    Emergent phonological representations: no need for autosegmental architecture

    Get PDF
    This paper examines implications for autosegmental representations of a model that minimizes the role of an innate linguistic endowment in grammar formation. If the innate linguistic endowment is minimized, language learning is from the bottom up and cannot rely on universal structures. Bottom-up grammars share common goals with top-down grammars, among them to identify and characterize phonological patterns. In this paper, we examine vowel distribution in Tiv, a Niger-Congo language of Nigeria. The six Tiv vowels occur in restricted positions in verbs: only 10 of the 36 possible V1(C)V2 sequences occur with any frequency. Tiv vowel distribution has been explained in terms of feature geometry, association rules, and spreading rules. We show that while the vowel distribution can be expressed using such an architecture, it can also be expressed in simpler terms, relying only on nonlinguistic capabilities such as the ability to evaluate input based on similarity and frequency, and the ability to construct symbolic representations of such data. In addition to conceptual arguments in favor of a bottom-up, emergent phonology, the paper provides an example of the analysis of a phonological system under the Emergence hypothesis.postprin

    The articulation of lexical palatalization in Scottish Gaelic

    Get PDF
    Session 4aSC - Speech Communication: Cross-Language Topics in Speech Communication (Poster Session) - Contributed Paper: 4aSC6Scottish Gaelic (Gàidhlig, henceforth SG) exhibits a rich system of consonant mutation, which is mostly governed by its morphology (Ladefoged et al. 1998; Gillies 2002; Stewart 2004). For instance, bàta “boat” changes to [v] when the word undergoes morphological inflection—e.g., a bhàta “his boat”, in which the sound spelled bh is pronounced as [v]. Using ultrasound imaging, the present study investigates palatalization in SG, which is considered as one of lexicalized consonant mutation types. Experimental data was collected in Sabhal Mòr Ostaig, a college on the Isle of Skye. Preliminary results show a clear sign of palatalization across different consonant types in palatalization environments (i.e., when morphologically conditioned), represented by higher tongue contours in the front region of tongue. While the articulatory distinction between plain and palatalized consonants is significant, different syllabic positions (i.e., word-initial vs. -final palatalization) often yield individualized patterns.published_or_final_versio

    Syllabification and prosodic templates in Yawelmani

    No full text
    This article addresses the interaction of syllabification and templatic morphology in Yawelmani. The morphological templates (in CV terms, CVCC, CVVCC, and CVCVVC) do not parse directly into well-formed Yawelmani surface syllables (CV, CVV, CVC). Nonetheless, as argued here, these templates can be expressed in terms of legitimate prosodic units, thereby supporting the prosodic morphology hypothesis (McCarthy and Prince 1986, 1987, 1990). The basic idea is that segments map from left to right to the template, but if a template is too small, any leftover stem consonants simply undergo right to left syllabification. This analysis accounts for the general templatic mapping of verbs and nouns as well as the different kinds of reduplication in Yawelmani. It also provides a more explanatory account of the 'ghost' consonants - initial consonants of some of the suffixes which surface only when the stem is biconsonantal, but not if the stem is larger. The analysis not only provides support for the prosodic morphology hypothesis, it also argues in favor of a templatic view of syllabification (Itô 1986, 1989) and a rule of Weight-by-Position (Hayes 1989) operating independently of the general syllabification process. © 1991 Kluwer Academic Publishers.link_to_subscribed_fulltex

    The root CV-template as a property of the affix: Evidence from Yawelmani

    No full text
    In this article, I have provided support for a skeletal core independent of any phonemic material. This is not a new theoretical claim, but rather adds to a small but growing literature (McCarthy 1979, 1981, Halle and Vergnaud 1980, Harris 1980, Marantz 1982, Yip 1982). However, the analysis here is an important addition because the skeleta are added to the grammer in an unfamiliar manner: affixes may determine the skeletal template of a root; if not, a default template is supplied, determined by a lexical diacritic on each verb root. Interestingly, recent work on Norwegian tone by Withgott and Halvorsen (in prep) suggests that when a suffix bears tone in Norwegian, the suffixal tone pattern surfaces on the word. With no affixes or with a toneless suffix, the underlying (or default) tone of the word surfaces. This parallels in tone the example in templates that Yokuts provides. In section 3, a CV-template pool consisting of the three default templates of verbs in Yawelmani was established. Certain affixes supply templates from this pool, and the phonemic melody of the root associates with the selected template according to universal conventions and the rule of V Spread (43). The assumption of a pool containing only three templates accounts for the pairing of bi- and triconsonantal forms when a template is selected by an affix, that is the CVC-CVCC, CVVC-CVVCC, and CVCVV-CVCVVC pairings. The triconsonantal template is selected in all cases. With biconsonantal roots, the third C-slot of the template has no segment associated with it, and so cannot surface. This explanation is elegant and concise, but is not available without the existence of an independent skeletal tier. © 1983 D. Reidel Publishing Company.link_to_subscribed_fulltex

    Emergent morphology and phonology: an example from Assamese

    No full text
    Parallel Session 1

    Kinande vowel harmony: Domains, grounded conditions and one-sided alignment

    No full text
    The canonical image of vowel harmony is of a particular feature distributed throughout a word, leading to symmetric constraints like AGREE or SPREAD. Examination of the distribution of tongue-root advancement in Kinande demonstrates that harmonic feature distribution is asymmetric. The data argue that a formal (yet asymmetric) constraint (like ALIGN) is exactly half right: such a constraint correctly characterises the left edge of the harmonic domain. By contrast, the right edge is necessarily characterised by phonetically grounded restrictions on feature co-occurrence. Of further interest is the role of morphological domains: the interaction between domain restrictions on specific constraints and unrestricted constraints suggests a formal means of characterising the overwhelming similarity between constraint hierarchies at different morphological levels while at the same time characterising the distinctions between levels. © 2002 Cambridge University Press.link_to_subscribed_fulltex

    Testing autotrace

    No full text
    While ultrasound provides a remarkable tool for tracking the tongue's movements during speech, it has yet to emerge as the powerful research tool it could be. A major roadblock is that the means of appropriately labeling images is a laborious, time-intensive undertaking. In earlier work, Fasel and Berry (2010) introduced a 'translational' deep belief network (tDBN) approach to automated labeling of ultrasound images of the tongue, and tested it against a single-speaker set of 3209 images. This study tests the same methodology against a much larger data set (about 40,000 images), using data collected for different studies with multiple speakers and multiple languages. Retraining a “generic” network with a small set of the most erroneously labeled images from language-specific development sets resulted in an almost three-fold increase in precision in the three test cases examined. © 2014 Acoustical Society of Americ

    Autotrace: an automatic system for tracing tongue contours

    No full text
    Ultrasound imaging of the tongue is used for analyzing the articulatory features of speech sounds. In order to be able to study the movements of the tongue, the tongue surface contour has to be traced for each recorded image. In order to capture the details of the tongue’s movement during speech, the ultrasound video is generally recorded at the highest frame rate available. Detail comes at a price. The number of frames produced from even a single non-trivial experiment is often far too large to trace manually. The Arizona Phonological Imaging Lab (APIL) at the University of Arizona has developed a suite of tools to simplify the labeling and analysis of tongue contours. AutoTrace is a state-of-the-art automatic method for tracing tongue contours that is robust across speakers and languages and operates independently of frame order. The workshop will outline the software installation procedure, introduce the included tools for selecting and preparing training data, provide instructions for automated tracing, and overview a method for measuring the network’s accuracy using the Mean Sum of Distances (MSD) metric described by Li et al. (2005). © 2014 Acoustical Society of Americ
    • …
    corecore