1,889 research outputs found
Sound Source Separation
This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2
A computational framework for sound segregation in music signals
Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
An exploration of the rhythm of Malay
In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing.
The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (âC, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English.
Further analysis has been carried out in light of Fletcherâs (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvanitiâs (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listenersâ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima.
This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm
On the Perceptual Organization of Speech
A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals
Spectral and temporal implementation of Japanese speakers' English vowel categories : a corpus-based study
This study investigates the predictions of second language (L2) speech acquisition models â SLM(-r), PAM(-L2), and L2LP â on how native (L1) Japanese speakers implement the spectral and temporal aspects of L2 American English vowel categories. Data were obtained from 102 L1 Japanese speakers in the J-AESOP corpus, which also includes nativelikeness judgments by trained phoneticians. Spectrally, speakers judged to be non-nativelike showed a strong influence from L1 categories, except L2 /Ê/ which could be deflected away from L1 /a/ according to SLM(-r) and L2 /ÉË/ which seemed orthographically assimilated to L1 /o/ according to PAM(-L2). More nativelike speakers showed vowel spectra similar to those of native English speakers across all vowels, in accordance with L2LP. Temporally, although speakers tended to equate the phonetic length of English vowels with Japanese phonemic length distinctions, segment-level L1-L2 category similarity was not a significant predictor of the speakersâ nativelikeness. Instead, the implementation of prosodic-level factors such as stress and phrase-final lengthening were better predictors. The results highlight the importance of suprasegmental factors in successful category learning and also reveal a weakness in current models of L2 speech acquisition, which focus primarily on the segmental level. Theoretical and pedagogical implications are discussed
Recommended from our members
Speech rhythm: the language-specific integration of pitch and duration
Experimental phonetic research on speech rhythm seems to have reached an impasse. Recently, this research field has tended to investigate produced (rather than perceived) rhythm, focussing on timing, i.e. duration as an acoustic cue, and has not considered that rhythm perception might be influenced by native language. Yet evidence from other areas of phonetics, and other disciplines, suggests that an investigation of rhythm is needed which (i) focuses on listenersâ perception, (ii) acknowledges the role of several acoustic cues, and (iii) explores whether the relative significance of these cues differs between languages. This thesis, the originality of which derives from its adoption of these three perspectives combined, indicates new directions for progress. A series of perceptual experiments investigated the interaction of duration and f0 as perceptual cues to prosody in languages with different prosodic structures â Swiss German, Swiss French, and French (i.e. from France). The first experiment demonstrated that a dynamic f0 increases perceived syllable duration in contextually isolated pairs of monosyllables, for all three language groups. The second experiment found that dynamic f0 and increased duration interact as cues to rhythmic groups in series of monosyllabic digits and letters; the two cues were significantly more effective than one when heard simultaneously, but significantly less effective than one when heard in conflicting positions around the rhythmic-group boundary location, and native language influenced whether f0 or duration was the more effective cue.
These two experiments laid the basis for the third, which directly addressed rhythm. Listeners were asked to judge the rhythmicality of sentences with systematic duration and f0 manipulations; the results provide evidence that duration and f0 are interdependent cues in rhythm perception, and that the weighting of each cue varies in different languages. A fourth experiment applied the perceptual results to production data, to develop a rhythm metric which captures the multi-dimensional and language-specific nature of perceived rhythm in speech production. These findings have the important implication that if future phonetic research on rhythm follows these new perspectives, it may circumvent the impasse and advance our knowledge and model of speech rhythm.This work was funded by an AHRC doctoral award to the author
Audio content identification
Die Entwicklung und Erforschung von inhaltsbasierenden "Music Information Retrieval (MIR)'' - Anwendungen in den letzten Jahren hat gezeigt, dass die automatische Generierung von Inhaltsbeschreibungen, die eine Identifikation oder Klassifikation von Musik oder Musikteilen ermöglichen, eine bewĂ€ltigbare Aufgabe darstellt. Aufgrund der groĂen Massen an verfĂŒgbarer digitaler Musik und des enormen Wachstums der entsprechenden Datenbanken, werden Untersuchungen durchgefĂŒhrt, die eine möglichst automatisierte AusfĂŒhrung der typischen Managementprozesse von digitaler Musik ermöglichen.
In dieser Arbeit stelle ich eine allgemeine EinfĂŒhrung in das Gebiet des ``Music Information Retrieval'' vor, insbesondere die automatische Identifikation von Audiomaterial und den Vergleich von Ă€hnlichkeitsbasierenden AnsĂ€tzen mit reinen inhaltsbasierenden âFingerprintâ-Technologien. Einerseits versuchen Systeme, den menschlichen Hörapparat bzw. die Wahrnehmung und Definition von "Ăhnlichkeit'' zu modellieren, um eine Klassifikation in Gruppen von verwandten Musiktiteln und im Weiteren eine Identifikation zu ermöglichen. Andererseits liegt der Fokus auf der Erstellung von Signaturen, die auf eine eindeutige Wiedererkennung abzielen ohne jede Aussage ĂŒber Ă€hnlich klingende Alternativen. In der Arbeit werden eine Reihe von Tests durchgefĂŒhrt, die deutlich machen sollen, wie robust, zuverlĂ€ssig und anpassbar Erkennungssysteme arbeiten sollen, wobei eine möglichst hohe Rate an richtig erkannten MusikstĂŒcken angestrebt wird. DafĂŒr werden zwei Algorithmen, Rhythm Patterns, ein Ă€hnlichkeitsbasierter Ansatz, und FDMF, ein frei verfĂŒgbarer Fingerprint-Extraktionsalgorithmus mittels 24 durchgefĂŒhrten TestfĂ€llen gegenĂŒbergestellt, um die Arbeitsweisen der Verfahren zu vergleichen. Diese Untersuchungen zielen darauf ab, eine möglichst hohe Genauigkeit in der Wiedererkennung zu erreichen. Ăhnlichkeitsbasierte AnsĂ€tze wie Rhythm Patterns erreichen bei der Identifikation Wiedererkennungsraten bis zu 89.53% und ĂŒbertreffen in den durchgefĂŒhrten Testszenarien somit den untersuchten Fingerprint-Ansatz deutlich. Eine sorgfĂ€ltige Auswahl relevanter Features, die zur Berechnung von Ăhnlichkeit herangezogen werden, fĂŒhren zu Ă€uĂerst vielversprechenden Ergebnissen sowohl bei variierten Ausschnitten der MusikstĂŒcke als auch nach erheblichen SignalverĂ€nderungen.The development and research of content-based music information retrieval (MIR) applications in the last years have shown that the generation of descriptions enabling the identification and classification of pieces of musical audio is a challenge that can be coped with. Due to the huge masses of digital music available and the growth of the particular databases, there are investigations of how to automatically perform tasks concerning the management of audio data.
In this thesis I will provide a general introduction of the music information retrieval techniques, especially the identification of audio material and the comparison of similarity-based approaches with content-based fingerprint technology. On the one hand, similarity retrieval systems try to model the human auditory system in various aspects and therewith the model of perceptual similarity. On the other hand there are fingerprints or signatures which try to exactly identify music without any assessment of similarity of sound titles. To figure out the differences and consequences of using these approaches I have performed several experiments that make clear how robust and adaptable an identification system must work. Rhythm Patterns, a similarity based feature extraction scheme and FDMF, a free fingerprint algorithm have been investigated by performing 24 test cases in order to compare the principle behind. This evaluation has also been done focusing on the greatest possible accuracy. It has come out that similarity features like Rhythm Patterns are able to identify audio titles promisingly as well (i.e. up to 89.53 %) in the introduced test scenarios. The proper choice of features enables that music tracks are identified at best when focusing on the highest similarity between the candidates both for varied excerpts and signal modifications
Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019
International audienc
- âŠ