345 research outputs found

    The Zurich Corpus of Vowel and Voice Quality, Version 1.0

    Get PDF
    Existing databases of isolated vowel sounds or vowel sounds embedded in consonantal context generally document only limited variation of basic production parameters. Thus, concerning the possible variation range of vowel and voice quality-related sound characteristics, there is a lack of broad phenomenological and descriptive references that allow for a comprehensive understanding of vowel acoustics and for an evaluation of the extent to which corresponding existing approaches and models can be generalised. In order to contribute to the building up of such references, a novel database of vowel sounds that exceeds any existing collection by size and diversity of vocalic characteristics is presented here, comprised of c. 34 600 utterances of 70 speakers (46 nonprofessional speakers, children, women and men, and 24 professional actors/actresses and singers of straight theatre, contemporary singing, and European classical singing). The database focuses on sounds of the long Standard German vowels /i-y-e-ø-a-o-u/ produced with varying basic production parameters such as phonation type, vocal effort, fundamental frequency, vowel context and speaking or singing style. In addition, a read text and, for professionals, songs are also included. The database is accessible for scientific use, and further extensions are in progress

    Reorganization of the auditory-perceptual space across the human vocal range

    Full text link
    We analyzed the auditory-perceptual space across a substantial portion of the human vocal range (220-1046 Hz) using multidimensional scaling analysis of cochlea-scaled spectra from 250-ms vowel segments, initially studied in Friedrichs et al. (2017) J. Acoust. Soc. Am. 142 1025-1033. The dataset comprised the vowels /i y e {\o} {\epsilon} a o u/ (N=240) produced by three native German female speakers, encompassing a broad range of their respective voice frequency ranges. The initial study demonstrated that, during a closed-set identification task involving 21 listeners, the point vowels /i a u/ were significantly recognized at fundamental frequencies (fo) nearing 1 kHz, whereas the recognition of other vowels decreased at higher pitches. Building on these findings, our study revealed systematic spectral shifts associated with vowel height and frontness as fo increased, with a notable clustering around /i a u/ above 523 Hz. These observations underscore the pivotal role of spectral shape in vowel perception, illustrating the reliance on acoustic anchors at higher pitches. Furthermore, this study sheds light on the quantal nature of these vowels and their potential impact on language evolution, offering a plausible explanation for their widespread presence in the world's languages

    Investigación de la individualidad del locutor en el alemán estándar de Suecia en cuatro regiones dialectales alemánicas: cantidad consonántica, cualidad vocálica y variables temporales

    Get PDF
    While German-speaking Switzerland manifests a considerable amount of dialectal diversity, until the present day the phonetic interrelation of Alemannic (ALM) dialects and spoken Swiss Standard German (SSG) has not been studied with an acoustic phonetic approach on the speaker level. In this study, out of a pool of 32 speakers (controlled for sex, age, and education level) from 4 dialectologically distinct ALM areas, 16 speakers with 2 dialects were analysed regarding SSG consonant duration (in words whose ALM equivalents may or may not have a geminate), 8 speakers from the city of Bern (BE) were analysed for vowel quality, and 32 speakers were analysed for temporal variables, i.e., articulation rate (AR) and vocalic-speech percentage (%V). Results reveal that there is much intradialectal inter- and intraspeaker variation in all three aspects scrutinised, but especially regarding vowel quality of BE SSG mid vowels and temporal variables. As for consonant quantity, while intradialectal interspeaker variation was observed, speakers showed a tendency towards normalised SSG consonant durations that resemble the normalised consonant durations in their ALM dialect. In general, these results suggest that a speaker’s dialect background is only one factor amongst many that influence the way in which Swiss Standard German is spoken.Aunque la Suiza de habla alemana cuente con una diversidad dialectal considerable, hasta hoy no se ha estudiado la interrelación fonética entre los dialectos alemánicos (ALM) y el alemán estándar suizo (SSG) con un enfoque acústico a nivel de locutor. La muestra para este estudio se compone de 32 informantes (controlados por sexo y edad) procedentes de cuatro distintas regiones dialectales ALM. De 16 locutores de 2 dialectos se analiza la duración consonántica en palabras SSG cuyos equivalentes ALM pueden tener o faltar una geminada. De 8 locutores de la ciudad de Berna se analizan distintos timbres vocálicos. Además, para todos los 32 locutores se calculan dos variables temporales, o sea la velocidad de articulación (AR) y el portentaje vocálico del habla (%V). <>Los resultados revelan que existe mucha variación inter- e intraindividual en todos los tres aspectos investigados, pero sobre todo en el timbre vocálico de las vocales medias en BE SSG y en las variables temporales. En relación a la cantidad consonántica, se ha observado cierta variación intradialectal entre varios locutores, pero al mismo tiempo muchos locutores muestran duraciones consonánticas normalizadas SSG que se parecen a las duraciones consonánticas normalizadas en su propio dialecto ALM. En general, estos resultados sugieren que el dialecto alemánico es solo uno entre varios factores que influyen en la pronunciación del alemán estándar suizo

    The function and evolution of child-directed communication

    Get PDF
    Funding: Writing this article was supported by the National Centre of Competence in Research (NCCR) Evolving Language, Swiss National Science Foundation Agreement 51NF40 180888 for JS, CF, FW, KZ, CPvS, SWT and SS. SWT was additionally funded by Swiss National Science Foundation grant PP00P3_198912.Humans communicate with small children in unusual and highly conspicuous ways (child- directed communication (CDC)), which enhance social bonding and facilitate language acquisition. CDC-like inputs are also reported for some vocally learning animals, suggesting similar functions in facilitating communicative competence. However, adult great apes, our closest living relatives, rarely signal to their infants, implicating communication surrounding the infant as the main input for infant great apes and early humans. Given cross-cultural variation in the amount and structure of CDC, we suggest that child-surrounding communication (CSC) provides essential compensatory input when CDC is less prevalent—a paramount topic for future studies.Publisher PDFNon peer reviewe

    Akustische Phonetik und ihre multidisziplinären Aspekte

    Get PDF
    The aim of this book is to honor the multidisciplinary work of Doz. Dr. Sylvia Moosmüller† in the field of acoustic phonetics. The essays in this volume range from sociophonetics, language diagnostics, dialectology, to language technology. They thus exemplify the breadth of acoustic phonetics, which has been shaped by influences from the humanities and technical sciences since its beginnings.Ziel dieses Buches ist es, die multidisziplinäre Arbeit von Doz. Dr. Sylvia Moosmüller (†) im Bereich der akustischen Phonetik zu würdigen. Die Aufsätze in diesem Band sind in der Soziophonetik, Sprachdiagnostik, Dialektologie und Sprachtechnologie angesiedelt. Sie stellen damit exemplarisch die Breite der akustischen Phonetik dar, die seit ihrer Entstehung durch Einflüsse aus den Geisteswissenschaften und den technischen Wissenschaften geprägt war

    Improving Searchability of Automatically Transcribed Lectures Through Dynamic Language Modelling

    Get PDF
    Recording university lectures through lecture capture systems is increasingly common. However, a single continuous audio recording is often unhelpful for users, who may wish to navigate quickly to a particular part of a lecture, or locate a specific lecture within a set of recordings. A transcript of the recording can enable faster navigation and searching. Automatic speech recognition (ASR) technologies may be used to create automated transcripts, to avoid the significant time and cost involved in manual transcription. Low accuracy of ASR-generated transcripts may however limit their usefulness. In particular, ASR systems optimized for general speech recognition may not recognize the many technical or discipline-specific words occurring in university lectures. To improve the usefulness of ASR transcripts for the purposes of information retrieval (search) and navigating within recordings, the lexicon and language model used by the ASR engine may be dynamically adapted for the topic of each lecture. A prototype is presented which uses the English Wikipedia as a semantically dense, large language corpus to generate a custom lexicon and language model for each lecture from a small set of keywords. Two strategies for extracting a topic-specific subset of Wikipedia articles are investigated: a naïve crawler which follows all article links from a set of seed articles produced by a Wikipedia search from the initial keywords, and a refinement which follows only links to articles sufficiently similar to the parent article. Pair-wise article similarity is computed from a pre-computed vector space model of Wikipedia article term scores generated using latent semantic indexing. The CMU Sphinx4 ASR engine is used to generate transcripts from thirteen recorded lectures from Open Yale Courses, using the English HUB4 language model as a reference and the two topic-specific language models generated for each lecture from Wikipedia

    Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

    Get PDF

    The Impact of Arabic Diacritization on Word Embeddings

    Get PDF
    Word embedding is used to represent words for text analysis. It plays an essential role in many Natural Language Processing (NLP) studies and has hugely contributed to the extraordinary developments in the field in the last few years. In Arabic, diacritic marks are a vital feature for the readability and understandability of the language. Current Arabic word embeddings are non-diacritized. In this paper, we aim to develop and compare word embedding models based on diacritized and non-diacritized corpora to study the impact of Arabic diacritization on word embeddings. We propose evaluating the models in four different ways: clustering of the nearest words; morphological semantic analysis; part-of-speech tagging; and semantic analysis. For a better evaluation, we took the challenge to create three new datasets from scratch for the three downstream tasks. We conducted the downstream tasks with eight machine learning algorithms and two deep learning algorithms. Experimental results show that the diacritized model exhibits a better ability to capture syntactic and semantic relations and in clustering words of similar categories. Overall, the diacritized model outperforms the non-diacritized model. Interestingly, we obtained some more interesting findings. For example, from the morphological semantics analysis, we found that with the increase in the number of target words, the advantages of the diacritized model are also more obvious, and the diacritic marks have more significance in POS tagging than in other tasks
    • …
    corecore