11,661 research outputs found

    Pauses and the temporal structure of speech

    Get PDF
    Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

    Sperry Univac speech communications technology

    Get PDF
    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described

    ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning

    Get PDF
    Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species

    Distinctive features

    Get PDF

    Complex sequencing rules of birdsong can be explained by simple hidden Markov processes

    Get PDF
    Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical propertiesof the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable sequences, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. This property is shared with other complex sequential behaviors. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model (GMM)), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex sequences with higher-order dependencies

    Stosunek polskich uczniów do nauki wymowy języka angielskiego: analizując od nowa

    Get PDF
    It is widely agreed that acquisition of a sound system of a second language always presents a great challenge for L2 learners (e.g. Rojczyk, 2010). Numerous studies (e.g. Nowacka, 2010; Flege, 1991) prove that L2 learners whose first language has a scarce number of sounds, encounter difficulties in distinguishing L2 sound categories and tend to apply their L1 segments to new contexts. There is abundance of studies examining L2 learners’ successes and failures in production of L1 and L2 sounds, especially vowels (e.g. Flege, 1992; Nowacka, 2010; Rojczyk, 2010). However, the situation becomes more complicated when we consider third language production. While in the case of L2 segmental production the number of factors affecting L2 sounds is rather limited (either interference from learners’ L1 or some kind of L2 intralingual influence), in the case of L3 segmental production we may encounter L1→L3, L2→L3, L1+L2→L3 or L3 intralingual interference. This makes separation of L3 sounds a much more complex process. The aim of this paper is to examine whether speakers of L1 Polish, L2 English and L3 German are able to separate new, L3 vowel categories from their native and L2 categories. The research presented in this article is a part of a larger project assessing production of L3 segments. This time the focus is on German /y/. This vowel was chosen since it is regarded as especially difficult for Polish learners of German and it is frequently substituted with some other sounds. A group of English philology (Polish-English- German translation and interpretation programme) students was chosen to participate in this study. They were native speakers of Polish, advanced speakers of English and upper-intermediate users of German. They had been taught both English and German pronunciation courses during their studies at the University of Silesia. The subjects were asked to produce words containing analysed vowels, namely: P /u/, P /i/, E /uÉ/, E /iÉ/, E /ɪ/ and G /y/. All examined vowels were embedded in a /bVt/ context. The target /bVt/ words were then embedded in carrier sentences: I said /bVt/ this time in English, Ich sag’ /bVt/ diesmal in German and Mówię /bVt/ teraz in Polish, in a non-final position. The sentences were presented to subjects on a computer screen and the produced chunks were stored in a notebook’s memory as .wav files ready for inspection. The Praat 5.3.12 speech-analysis software package (Boersma, 2001) was used to measure and analyse the recordings. The obtained results suggest that L2 affects L3 segmental production to a significant extent. Learners find it difficult to separate all “new” and “old” vowel categories, especially if they are perceived as “similar” to one another and when learners strive to sound “foreign”.Przyswajanie systemu fonetycznego języka drugiego (J2) zawsze jest ogromnym wyzwaniem dla uczących się nowego języka (np. Rojczyk, 2010). Liczne badania (np. Flege, 1991; Nowacka, 2010) udowodniły, że w przypadku, gdy J1 uczących się nowego języka ma raczej ograniczoną liczbę dźwięków, wówczas osoby te mają problemy z odróżnianiem większej liczby nowych głosek i często zastępują je ojczystymi segmentami. Łatwo można znaleźć wiele badań dotyczących sukcesów i porażek w produkcji i percepcji nowych dźwięków przez uczących się J2 (np. Flege, 1992; Nowacka, 2010; Rojczyk, 2010), jednakże sytuacja staje się znacznie bardziej skomplikowana w przypadku przyswajania języka trzeciego (J3). Podczas przyswajania języka drugiego liczba czynników wpływających na proces produkcji poszczególnych segmentów jest raczej ograniczona (może to być wpływ języka pierwszego lub też interferencja językowa wewnątrz J2), natomiast podczas przyswajania języka trzeciego ich liczba jest zdecydowanie większa (J1→J3, J2→L3, J1+J2→L3 lub procesy zachodzące wewnątrz J3). To wszystko sprawia, że przyswajanie systemu fonetycznego języka trzeciego jest procesem wyjątkowo złożonym. Celem niniejszego artykułu było zbadanie czy rodzimi użytkownicy języka polskiego z J2 — angielskim i J3 — niemieckim, są zdolni do oddzielenia nowych, niemieckich kategorii samogłoskowych od tych polskich i angielskich. Badanie tu opisane jest częścią większego projektu mającego na celu ocenę produkcji samogłosek w J3. Tym razem opisana jest produkcja niemieckiego /y/. Samogłoska ta została wybrana ponieważ jest uważana przez uczących się języka niemieckiego za wyjątkowo trudną i często jest zastępowana innymi, podobnymi polskimi dźwiękami. Uczestnikami badania była grupa studentów filologii angielskiej, potrójnego programu tłumaczeniowego: polsko-angielsko-niemieckiego. Byli rodzimymi użytkownikami języka polskiego, zaawansowanymi użytkownikami języka angielskiego i średniozaawansowanymi użytkownikami języka niemieckiego. Przed przystąpieniem do badania, byli oni uczeni wymowy obu obcych języków. W trakcie badania musieli wyprodukować słowa zawierające wszystkie badane dźwięki, mianowicie: P/u/, P/i/, A/uÉ/, A/iÉ/, A /ɪ/ oraz N/y/. Wszystkie badane samogłoski były ukryte w kontekście /bSt/ . Te słowa były następnie ukryte w zdaniach: I said /bVt/ this time po angielsku, Ich sag’ /bVt/ diesmal po niemiecku oraz Mówię /bVt/ teraz po polsku. Wszystkie wypowiedzi zostały nagrane jako pliki .wav, a następnie poddane analizie akustycznej przy użyciu programu Praat (Boersma, 2001). Uzyskane wyniki pokazały jak trudne dla uczących się języków jest rozdzielenie „nowych” i „starych” samogłosek, zwłaszcza, gdy brzmią one podobnie, a mówiący starają się mówić „jak obcokrajowiec”

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Discourse and information structure in Kadorih

    Get PDF
    corecore