134 research outputs found

    Articulatory features for conversational speech recognition

    Get PDF

    Why the Left Hemisphere Is Dominant for Speech Production: Connecting the Dots

    Get PDF
    Evidence from seemingly disparate areas of speech/language research is reviewed to form a unified theoretical account for why the left hemisphere is specialized for speech production. Research findings from studies investigating hemispheric lateralization of infant babbling, the primacy of the syllable in phonological structure, rhyming performance in split-brain patients, rhyming ability and phonetic categorization in children diagnosed with developmental apraxia of speech, rules governing exchange errors in spoonerisms, organizational principles of neocortical control of learned motor behaviors, and multi-electrode recordings of human neuronal responses to speech sounds are described and common threads highlighted. It is suggested that the emergence, in developmental neurogenesis, of a hard-wired, syllabically-organized, neural substrate representing the phonemic sound elements of one’s language, particularly the vocalic nucleus, is the crucial factor underlying the left hemisphere’s dominance for speech production

    Why the Left Hemisphere Is Dominant for Speech Production: Connecting the Dots

    Get PDF
    Evidence from seemingly disparate areas of speech/language research is reviewed to form a unified theoretical account for why the left hemisphere is specialized for speech production. Research findings from studies investigating hemispheric lateralization of infant babbling, the primacy of the syllable in phonological structure, rhyming performance in split-brain patients, rhyming ability and phonetic categorization in children diagnosed with developmental apraxia of speech, rules governing exchange errors in spoonerisms, organizational principles of neocortical control of learned motor behaviors, and multi-electrode recordings of human neuronal responses to speech sounds are described and common threads highlighted. It is suggested that the emergence, in developmental neurogenesis, of a hard-wired, syllabically-organized, neural substrate representing the phonemic sound elements of one’s language, particularly the vocalic nucleus, is the crucial factor underlying the left hemisphere’s dominance for speech production

    Articulatory features for robust visual speech recognition

    Full text link

    Exploiting phonological constraints for handshape recognition in sign language video

    Full text link
    The ability to recognize handshapes in signing video is essential in algorithms for sign recognition and retrieval. Handshape recognition from isolated images is, however, an insufficiently constrained problem. Many handshapes share similar 3D configurations and are indistinguishable for some hand orientations in 2D image projections. Additionally, significant differences in handshape appearance are induced by the articulated structure of the hand and variants produced by different signers. Linguistic rules involved in the production of signs impose strong constraints on the articulations of the hands, yet, little attention has been paid towards exploiting these constraints in previous works on sign recognition. Among the different classes of signs in any signed language, lexical signs constitute the prevalent class. Morphemes (or, meaningful units) for signs in this class involve a combination of particular handshapes, palm orientations, locations for articulation, and movement type. These are thus analyzed by many sign linguists as analogues of phonemes in spoken languages. Phonological constraints govern the ways in which phonemes combine in American Sign Language (ASL), as in other signed and spoken languages; utilizing these constraints for handshape recognition in ASL is the focus of the proposed thesis. Handshapes in monomorphemic lexical signs are specified at the start and end of the sign. The handshape transition within a sign are constrained to involve either closing or opening of the hand (i.e., constrained to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of ASL video annotated with linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign

    Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach

    Get PDF
    Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new approaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the advent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of representation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because decision surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and linguistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisation. The approach pursued in this thesis is based on a consistent program, known as the Evolving Transformation System (ETS), motivated by the development and clarification of the concept of structural representation in pattern recognition and artificial intelligence from both theoretical and applied points of view. This thesis consists of two parts. In the first part of this thesis, a similarity-based approach to structural representation of speech is presented. First, a linguistically well-motivated structural representation of phones based on distinctive phonological features recovered from speech is proposed. The representation consists of string templates representing phones together with a similarity measure. The set of phonological templates together with a similarity measure defines a symbolic metric space. Representation and ETS-inspired categorisation in the symbolic metric spaces corresponding to the phonological structural representation are then investigated by constructing appropriate symbolic space classifiers and evaluating them on a standard corpus of read speech. In addition, similarity-based isometric transition from phonological symbolic metric spaces to the corresponding non-Euclidean vector spaces is investigated. Second part of this thesis deals with the formal approach to structural representation of spoken language. Unlike the approach adopted in the first part of this thesis, the representation developed in the second part is based on the mathematical language of the ETS formalism. This formalism has been specifically developed for structural modelling of dynamic processes. In particular, it allows the representation of both objects and classes in a uniform event-based hierarchical framework. In this thesis, the latter property of the formalism allows the adoption of a more physiologically-concreteapproach to structural representation. The proposed representation is based on gestural structures and encapsulates speech processes at the articulatory level. Algorithms for deriving the articulatory structures from the data are presented and evaluated

    Feature extraction and event detection for automatic speech recognition

    Get PDF

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    The Emergence Of Phonological Categories

    Get PDF
    While phonological features are often assumed to be innate and universal (Chomsky and Halle, 1968), recent work argues for an alternative view that phonological features are emergent and acquired from linguistic input (e.g., Dresher, 2004; Mielke, 2008; Clements and Ridouane, 2011). This dissertation provides support for the emergent view of phonological features and proposes that the structure of the lexicon is the primary driving force in the emergence of phonological categories. Chapter 2 reviews the relevant developmental and theoretical literature on phonological acquisition and offers a reconsideration of the experimental findings in light of a clear distinction between phonetic and phonological knowledge. Chapter 3 presents a model of phonological category emergence in first language acquisition. In this model, the learner acquires phonological categories through creating lexically meaningful divisions in the acoustic space, and phonological categories adjust or increase in number to accommodate the representational needs of the learner\u27s increasing vocabulary. A computational experiment was run to test the validity of this model using acoustic measurements from the Philadelphia Neighborhood Corpus as the input. To provide evidence in support of a lexically based acquisition model, Chapter 4 uses the Providence Corpus to investigate developmental patterns in phonological acquisition. This corpus study shows that lexical contrast, not frequency, contributes to the development of production accuracy on both the word and phoneme levels in 1- to 3-year-old English-learning children. Chapter 5 extends the phonological acquisition model to study the role of lexical frequency and phonetic variation in the initiation and perpetuation of sound change. The results indicate that phonological change is overwhelmingly regular and categorical with little frequency effects. Overall, this dissertation provides substantive evidence for a lexically based account of phonological category emergence
    corecore