112 research outputs found
Learning the hidden structure of speech: from communicative functions to prosody
Este artigo introduz um novo método, orientado via modelamento e via interação com dados comportamentais, para gerar padrões prosódicos a partir de informação metalingüística. Referimos aqui à habilidade geral da entoação de demarcar unidades de fala e veicular informação sobre as funções proposicional e interacional dessas unidades no discurso. Nossas hipóteses fortes são que (1) essas funções são diretamente implementadas como contornos prosódicos prototípicos que são co-extensivos às unidades para as quais eles se aplicam, (2) o padrão prosódico da mensagem é obtido ao superpor e adicionar todos os contornos elementares (Aubergé & Bailly, 1995). Descrevemos aqui um esquema de análise por síntese que consiste em identificar esses contornos prototípicos e separar suas contribuições respectivas nos contornos prosódicos dos dados de treinamento. O esquema é aplicado a bases de dados designadas para evidenciar várias funções entoacionais. Resultados experimentais mostram que o modelo gera contornos prosódicos adequados com pouquíssimos movimentos prototípicos
Recommended from our members
Speech rhythm: the language-specific integration of pitch and duration
Experimental phonetic research on speech rhythm seems to have reached an impasse. Recently, this research field has tended to investigate produced (rather than perceived) rhythm, focussing on timing, i.e. duration as an acoustic cue, and has not considered that rhythm perception might be influenced by native language. Yet evidence from other areas of phonetics, and other disciplines, suggests that an investigation of rhythm is needed which (i) focuses on listeners’ perception, (ii) acknowledges the role of several acoustic cues, and (iii) explores whether the relative significance of these cues differs between languages. This thesis, the originality of which derives from its adoption of these three perspectives combined, indicates new directions for progress. A series of perceptual experiments investigated the interaction of duration and f0 as perceptual cues to prosody in languages with different prosodic structures – Swiss German, Swiss French, and French (i.e. from France). The first experiment demonstrated that a dynamic f0 increases perceived syllable duration in contextually isolated pairs of monosyllables, for all three language groups. The second experiment found that dynamic f0 and increased duration interact as cues to rhythmic groups in series of monosyllabic digits and letters; the two cues were significantly more effective than one when heard simultaneously, but significantly less effective than one when heard in conflicting positions around the rhythmic-group boundary location, and native language influenced whether f0 or duration was the more effective cue.
These two experiments laid the basis for the third, which directly addressed rhythm. Listeners were asked to judge the rhythmicality of sentences with systematic duration and f0 manipulations; the results provide evidence that duration and f0 are interdependent cues in rhythm perception, and that the weighting of each cue varies in different languages. A fourth experiment applied the perceptual results to production data, to develop a rhythm metric which captures the multi-dimensional and language-specific nature of perceived rhythm in speech production. These findings have the important implication that if future phonetic research on rhythm follows these new perspectives, it may circumvent the impasse and advance our knowledge and model of speech rhythm.This work was funded by an AHRC doctoral award to the author
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science.
There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science.
This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible
Suivi temporel de stimuli dynamiques interférants par marquage du plan temps-fréquence utilisant une statistique de passages par zéro
Dans un cadre d'Analyse de Scènes Auditives Computationnelle (CASA), ce papier présente un modèle de marquage du plan temps-fréquence par détection d'harmonicité. L'originalité du modèle tient à l'exploitation d'une statistique des passages par zéros du signal temporel pour le marquage, statistique qui fournit une mesure de la fiabilité du marquage par le biais de l'écart-type des longueurs d'intervalles inter-zéros du premier ordre. Après avoir présenté le modèle et son comportement, nous montrons que celui-ci peut-être utilisé pour le suivi de stimuli dynamiques présentant de fortes variations prosodiques
Decorative Timbre: Integrating characteristics of Spectral and Dastgah music
Decorative Timbre is a portfolio of original compositions and an accompanying written dissertation. In this thesis, I propose a new musical language synthesising the expressive element of Western Spectral and Persian Dastgah music via the marriage of timbre and ornamentation.
Persian and Spectral music are two fundamentally distinct musical approaches derived from different philosophies and traditions, each possessing a particular value and aesthetic. However, in researching mutual characteristics and modalities, I draw connections between the two forms of music under the concept of decorative timbre. I discuss approaches to 'converting a melody to timbre and vice versa' and offer a new compositional technique of 'excessive multilayering' that is inspired by shared commonalities in both traditions.
The portfolio comprises four works that explore the application of excessive multilayering; Abalfazl, War is Peace, Let me Tune, and Beautifully Untuned Mind. The centrepiece of my creative portfolio, Panbe Zan (the cotton beater), is an experimental electroacoustic opera that recreates and recontextualizes the forgotten sounds of an obsolete profession 'Panbe Zani (Cotton Beating).' Featuring a redesigned bow-shaped instrument together with live musicians, pre-recorded and manipulated sounds, and staging, the work portrays this nostalgic scene in a modern context
Negative vaccine voices in Swedish social media
Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination
Jeddah Arabic intonation : an autosegmental-metrical approach
IPhD ThesisThis thesis is a theoretical and instrumental investigation of intonation in Jeddah Arabic, an
urban Arabic variety spoken in west Saudi Arabia. The study is carried out in an attempt to
establish the dialect’s prosodic properties and to widen the scope and volume of the literature
on Arabic prosody that would in turn aid in the cross-dialectal comparison of prosodic and
intonational patterns. The investigation is carried out in light of the Auto-Segmental Metrical
theory of intonation- a theory that has been reported to account for the intonational patterns of
many languages. In AM theory, intonation is manifested via prominent F0 behaviour in
interaction with phonological structure, hence maintains a close relationship between accent
distribution and phonological/metrical structure. This F0 behaviour is examined acoustically
through pitch level, range and excursion size, in the form of increased peak height and
excursion, pitch compression or absence thereof to mark intonational structure. In addition to
pitch, other acoustic correlates such as duration and amplitude are examined as well. The thesis
includes the examination of the different tunes, postlexical phrasing, and accent categories
(contour shapes) that occur in the dialect. Moreover, and as an integral part of AM analysis,
the thesis closely examines both theoretically and acoustically the concepts of tonal alignment
and accentuation and information structure in this Arabic dialect. Data for the study were
collected from 20 native male and female speakers of Jeddah Arabic. Data were then semiautomatically
segmented and manually transcribed using a modified TOBI system for Arabic.
It is found that JA speakers rely on both qualitative and quantitative detail to enhance
intonationally important material that is conveyed prosodically. The results also point to that
JA is a stress-accent language that is although similar to other languages in this group,
contributes differently to the general cross-language prosodic variation. The dialect
demonstrates prominent pitch accents that faithfully associate and align with stressed syllables
and are distributed in two intonational levels above the prosodic word: the intermediate phrase
and the intonational phrase. Those two intonational levels are found to be marked by both tonal
and non-tonal correlates. Experimental evidence shows that contrary to the typical reported
correlates of those prosodic constituents, in JA intermediate phrases boundaries demonstrate
longer pre-boundary units than intonational phrases. This non-tonal pattern in intermediate
phrase boundaries correlates with later alignment of the tone with respect to the onset of the
stressed syllable
The Perception of Emotion from Acoustic Cues in Natural Speech
Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the other—particularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion
- …