729 research outputs found

    Do colourless green voices speak furiously? Linkages between phonetic and visual perception in synaesthesia

    Get PDF
    Synaesthesia is an unusual phenomenon, in which additional sensory perceptions are triggered by apparently unrelated sensory or conceptual stimuli. The main foci of this thesis lie in speech sound - colour and voice-induced synaesthesia. While grapheme-colour synaesthesia has been intensively researched, few studies have approached types of synaesthesia based on vocal inducers with detailed acoustic-phonetic and colorimetric analyses. This approach is taken here. First, a thorough examination of speech-sound - colour synaesthesia was conducted. An experiment is reported that tested to what extent vowel acoustics influence colour associations for synaesthetes and non-synaesthetes. Systematic association patterns between vowel formants and colour measures could be found in general, but most strongly in synaesthetes. Synaesthetes also showed a more consistent pattern of vowel-colour associations. The issue of whether or not speech-sound - colour synaesthesia is a discrete type of synaesthesia independent of grapheme-colour synaesthesia is discussed, and how these might influence each other. Then, two experiments are introduced to explore voice-induced synaesthesia. First, a comprehensive voice description task was conducted with voice synaesthetes, phoneticians and controls to investigate their verbal voice quality descriptions and the colour and texture associations that they have with voices. Qualitative analyses provided data about the nature of associations by the participant groups, while quantitative analyses revealed that for all groups, acoustic parameters such as pitch, pitch range, vowel formants and other spectral properties influenced colour and texture associations in a systematic way. Above all, a strong connection was found between these measures and luminance. Finally, voice-induced synaesthetes, other synaesthetes and controls participated in a voice line-up, of the kind used in forensic phonetic case work. This experiment, motivated by previous findings of memory advantages in synaesthetes in certain areas, tested whether synaesthetes’ voice memory is influenced by their condition. While no difference in performance was found between groups when using normal speech, voice-induced synaesthetes outperformed others in identifying a whispering speaker. These are the first group studies on the otherwise under-researched type of voice-induced synaesthesia, with a focus on acoustic rather than semantic analysis. This adds knowledge to the growing field of synaesthesia research from a largely neglected phonetic angle. The debate around (re)defining synaesthesia is picked up. The voice description experiment, in particular, leads to a discussion of a synaesthesia spectrum in the population, as many common mechanisms and associations were found. It was also revealed that less common types of synaesthesia are often difficult to define in a rigid way using traditional criteria. Finally, the interplay of different types of synaesthesia is discussed and findings are evaluated against the background of the existing theories of synaesthesia

    Production and perception of Libyan Arabic vowels

    Get PDF
    PhD ThesisThis study investigates the production and perception of Libyan Arabic (LA) vowels by native speakers and the relation between these major aspects of speech. The aim was to provide a detailed acoustic and auditory description of the vowels available in the LA inventory and to compare the phonetic features of these vowels with those of other Arabic varieties. A review of the relevant literature showed that the LA dialect has not been investigated experimentally. The small number of studies conducted in the last few decades have been based mainly on impressionistic accounts. This study consists of two main investigations: one concerned with vowel production and the other with vowel perception. In terms of production, the study focused on gathering the data necessary to define the vowel inventory of the dialect and to explore the qualitative and quantitative characteristics of the vowels contained in this inventory. Twenty native speakers of LA were recorded while reading target monosyllabic words in carrier sentences. Acoustic and auditory analyses were used in order to provide a fairly comprehensive and objective description of the vocalic system of LA. The results showed that phonologically short and long Arabic vowels vary significantly in quality as well as quantity; a finding which is increasingly being reported in experimental studies of other Arabic dialects. Short vowels in LA tend to be more centralised than has been reported for other Arabic vowels, especially with regards to short /a/. The study also looked at the effect of voicing in neighbouring consonants and vowel height on vowel duration, and the findings were compared to those of other varieties/languages. The perception part of the study explored the extent to which listeners use the same acoustic cues of length and quality in vowel perception that are evident in their production. This involved the use of continua from synthesised vowels which varied along duration and/or formant frequency dimensions. The continua were randomised and played to 20 native listeners who took part in an identification task. The results show that, when it comes to perception, Arabic listeners still rely mainly on quantity for the distinction between phonologically long and short vowels. That is, when presented with stimuli containing conflicting acoustic cues (formant frequencies that are typical of long vowels but with short duration or formant frequencies that are typical of short vowels but with long duration), listeners reacted consistently to duration rather than formant frequency. The results of both parts of the study provided some understanding of the LA vowel system. The production data allowed for a detailed description of the phonetic characteristics of LA vowels, and the acoustic space that they occupy was compared with those of other Arabic varieties. The perception data showed that production and perception do not always go hand in hand and that primary acoustic cues for the identification of vowels are dialect- and language-specific

    Variation and change in the vowel system of Tyneside English

    Get PDF
    PhD ThesisThis thesis presents a variationist account of phonological variation and change in the vowel system of Tyneside English. The distributions of the phonetic exponents of five vowel variables are assessed with respect to the social variables sex, age and social class. Using a corpus of conversational and word-list material, for which 32 speakers of Tyneside English were recorded, between 30 and 40 tokens per speaker of the variables (i), (u), (e), (o) and (3) were transcribed impressionistically and subclassified by following phonological context. The results of this analysis are significant on several counts. First, the speakers sampled appear to differentiate themselves within the speech community through the variable use of certain socially marked phonetic variants, which can be correlated with the sex, age and class variables. Secondly, the speakers style shift to a greater or lesser degree according to combinations of the three social factors, such that surface variability is reduced as a function of increased formality. Third, the overall pattern among the sample population seems to be one of increasing uniformity or convergence: it is speculated that social mobility among upper working- and lower-middle class groups may lead to accent levelling, whereby local speech forms are supplanted by supra-local or innovative intermediate ones. That is, the patterns observed here may be indicative of change in progress. Last, a comparison of the results for the (phonologically) paired variables (i u) and (e o) shows a strong tendency for Tyneside speakers to use these 'symmetrically', in that choice of variant in one variable predicts choice of variant in the other. It is suggested that the symmetry in the system is exploited by Tyneside speakers for the purposes of indicating social affiliation and identity, and is in this sense an extra sociolinguistic resource upon which speakers can draw. In addition, the variants of (3) are discussed with reference to the reported merger of this variable with (a); it is suggested that the apparent 'unmerging' of these two classes is unproblematic from a structural point of view, as the putative (3)—(o) merger appears never to have been completed.UK Economic and Social Research Council (award number R00429524350

    Discovering Dynamic Visemes

    Get PDF
    Abstract This thesis introduces a set of new, dynamic units of visual speech which are learnt using computer vision and machine learning techniques. Rather than clustering phoneme labels as is done traditionally, the visible articulators of a speaker are tracked and automatically segmented into short, visually intuitive speech gestures based on the dynamics of the articulators. The segmented gestures are clustered into dynamic visemes, such that movements relating to the same visual function appear within the same cluster. Speech animation can then be generated on any facial model by mapping a phoneme sequence to a sequence of dynamic visemes, and stitching together an example of each viseme in the sequence. Dynamic visemes model coarticulation and maintain the dynamics of the original speech, so simple blending at the concatenation boundaries ensures a smooth transition. The efficacy of dynamic visemes for computer animation is formally evaluated both objectively and subjectively, and compared with traditional phoneme to static lip-pose interpolation

    How touch and hearing influence visual processing in sensory substitution, synaesthesia and cross-modal correspondences

    Get PDF
    Sensory substitution devices (SSDs) systematically turn visual dimensions into patterns of tactile or auditory stimulation. After training, a user of these devices learns to translate these audio or tactile sensations back into a mental visual picture. Most previous SSDs translate greyscale images using intuitive cross-sensory mappings to help users learn the devices. However more recent SSDs have started to incorporate additional colour dimensions such as saturation and hue. Chapter two examines how previous SSDs have translated the complexities of colour into hearing or touch. The chapter explores if colour is useful for SSD users, how SSD and veridical colour perception differ and how optimal cross-sensory mappings might be considered. After long-term training, some blind users of SSDs report visual sensations from tactile or auditory stimulation. A related phenomena is that of synaesthesia, a condition where stimulation of one modality (i.e. touch) produces an automatic, consistent and vivid sensation in another modality (i.e. vision). Tactile-visual synaesthesia is an extremely rare variant that can shed light on how the tactile-visual system is altered when touch can elicit visual sensations. Chapter three reports a series of investigations on the tactile discrimination abilities and phenomenology of tactile-vision synaesthetes, alongside questionnaire data from synaesthetes unavailable for testing. Chapter four introduces a new SSD to test if the presentation of colour information in sensory substitution affects object and colour discrimination. Chapter five presents experiments on intuitive auditory-colour mappings across a wide variety of sounds. These findings are used to predict the reported colour hallucinations resulting from LSD use while listening to these sounds. Chapter six uses a new sensory substitution device designed to test the utility of these intuitive sound-colour links for visual processing. These findings are discussed with reference to how cross-sensory links, LSD and synaesthesia can inform optimal SSD design for visual processing

    Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

    Get PDF
    Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    A Comparative Study of Spectral Peaks Versus Global Spectral Shape as Invariant Acoustic Cues for Vowels

    Get PDF
    The primary objective of this study was to compare two sets of vowel spectral features, formants and global spectral shape parameters, as invariant acoustic cues to vowel identity. Both automatic vowel recognition experiments and perceptual experiments were performed to evaluate these two feature sets. First, these features were compared using the static spectrum sampled in the middle of each steady-state vowel versus features based on dynamic spectra. Second, the role of dynamic and contextual information was investigated in terms of improvements in automatic vowel classification rates. Third, several speaker normalizing methods were examined for each of the feature sets. Finally, perceptual experiments were performed to determine whether vowel perception is more correlated with formants or global spectral shape. Results of the automatic vowel classification experiments indicate that global spectral shape features contain more information than do formants. For both feature sets, dynamic features are superior to static features. Spectral features spanning a time interval beginning with the start of the on-glide region of the acoustic vowel segment and ending at the end of the off-glide region of the acoustic vowel segment are required for maximum vowel recognition accuracy. Speaker normalization of both static and dynamic features can also be used to improve the automatic vowel recognition accuracy. Results of the perceptual experiments with synthesized vowel segments indicate that if formants are kept fixed, global spectral shape can, at least for some conditions, be modified such that the synthetic speech token will be perceived according to spectral shape cues rather than formant cues. This result implies that overall spectral shape may be more important perceptually than the spectral prominences represented by the formants. The results of this research contribute to a fundamental understanding of the information-encoding process in speech. The signal processing techniques used and the acoustic features found in this study can also be used to improve the preprocessing of acoustic signals in the front-end of automatic speech recognition systems
    corecore