1,889 research outputs found

    Sound Source Separation

    Get PDF
    This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

    A computational framework for sound segregation in music signals

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    On the Perceptual Organization of Speech

    Get PDF
    A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals

    Spectral and temporal implementation of Japanese speakers' English vowel categories : a corpus-based study

    Get PDF
    This study investigates the predictions of second language (L2) speech acquisition models — SLM(-r), PAM(-L2), and L2LP — on how native (L1) Japanese speakers implement the spectral and temporal aspects of L2 American English vowel categories. Data were obtained from 102 L1 Japanese speakers in the J-AESOP corpus, which also includes nativelikeness judgments by trained phoneticians. Spectrally, speakers judged to be non-nativelike showed a strong influence from L1 categories, except L2 /ʌ/ which could be deflected away from L1 /a/ according to SLM(-r) and L2 /ɑː/ which seemed orthographically assimilated to L1 /o/ according to PAM(-L2). More nativelike speakers showed vowel spectra similar to those of native English speakers across all vowels, in accordance with L2LP. Temporally, although speakers tended to equate the phonetic length of English vowels with Japanese phonemic length distinctions, segment-level L1-L2 category similarity was not a significant predictor of the speakers’ nativelikeness. Instead, the implementation of prosodic-level factors such as stress and phrase-final lengthening were better predictors. The results highlight the importance of suprasegmental factors in successful category learning and also reveal a weakness in current models of L2 speech acquisition, which focus primarily on the segmental level. Theoretical and pedagogical implications are discussed

    Audio content identification

    Get PDF
    Die Entwicklung und Erforschung von inhaltsbasierenden "Music Information Retrieval (MIR)'' - Anwendungen in den letzten Jahren hat gezeigt, dass die automatische Generierung von Inhaltsbeschreibungen, die eine Identifikation oder Klassifikation von Musik oder Musikteilen ermöglichen, eine bewĂ€ltigbare Aufgabe darstellt. Aufgrund der großen Massen an verfĂŒgbarer digitaler Musik und des enormen Wachstums der entsprechenden Datenbanken, werden Untersuchungen durchgefĂŒhrt, die eine möglichst automatisierte AusfĂŒhrung der typischen Managementprozesse von digitaler Musik ermöglichen. In dieser Arbeit stelle ich eine allgemeine EinfĂŒhrung in das Gebiet des ``Music Information Retrieval'' vor, insbesondere die automatische Identifikation von Audiomaterial und den Vergleich von Ă€hnlichkeitsbasierenden AnsĂ€tzen mit reinen inhaltsbasierenden “Fingerprint”-Technologien. Einerseits versuchen Systeme, den menschlichen Hörapparat bzw. die Wahrnehmung und Definition von "Ähnlichkeit'' zu modellieren, um eine Klassifikation in Gruppen von verwandten Musiktiteln und im Weiteren eine Identifikation zu ermöglichen. Andererseits liegt der Fokus auf der Erstellung von Signaturen, die auf eine eindeutige Wiedererkennung abzielen ohne jede Aussage ĂŒber Ă€hnlich klingende Alternativen. In der Arbeit werden eine Reihe von Tests durchgefĂŒhrt, die deutlich machen sollen, wie robust, zuverlĂ€ssig und anpassbar Erkennungssysteme arbeiten sollen, wobei eine möglichst hohe Rate an richtig erkannten MusikstĂŒcken angestrebt wird. DafĂŒr werden zwei Algorithmen, Rhythm Patterns, ein Ă€hnlichkeitsbasierter Ansatz, und FDMF, ein frei verfĂŒgbarer Fingerprint-Extraktionsalgorithmus mittels 24 durchgefĂŒhrten TestfĂ€llen gegenĂŒbergestellt, um die Arbeitsweisen der Verfahren zu vergleichen. Diese Untersuchungen zielen darauf ab, eine möglichst hohe Genauigkeit in der Wiedererkennung zu erreichen. Ähnlichkeitsbasierte AnsĂ€tze wie Rhythm Patterns erreichen bei der Identifikation Wiedererkennungsraten bis zu 89.53% und ĂŒbertreffen in den durchgefĂŒhrten Testszenarien somit den untersuchten Fingerprint-Ansatz deutlich. Eine sorgfĂ€ltige Auswahl relevanter Features, die zur Berechnung von Ähnlichkeit herangezogen werden, fĂŒhren zu Ă€ußerst vielversprechenden Ergebnissen sowohl bei variierten Ausschnitten der MusikstĂŒcke als auch nach erheblichen SignalverĂ€nderungen.The development and research of content-based music information retrieval (MIR) applications in the last years have shown that the generation of descriptions enabling the identification and classification of pieces of musical audio is a challenge that can be coped with. Due to the huge masses of digital music available and the growth of the particular databases, there are investigations of how to automatically perform tasks concerning the management of audio data. In this thesis I will provide a general introduction of the music information retrieval techniques, especially the identification of audio material and the comparison of similarity-based approaches with content-based fingerprint technology. On the one hand, similarity retrieval systems try to model the human auditory system in various aspects and therewith the model of perceptual similarity. On the other hand there are fingerprints or signatures which try to exactly identify music without any assessment of similarity of sound titles. To figure out the differences and consequences of using these approaches I have performed several experiments that make clear how robust and adaptable an identification system must work. Rhythm Patterns, a similarity based feature extraction scheme and FDMF, a free fingerprint algorithm have been investigated by performing 24 test cases in order to compare the principle behind. This evaluation has also been done focusing on the greatest possible accuracy. It has come out that similarity features like Rhythm Patterns are able to identify audio titles promisingly as well (i.e. up to 89.53 %) in the introduced test scenarios. The proper choice of features enables that music tracks are identified at best when focusing on the highest similarity between the candidates both for varied excerpts and signal modifications

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc
    • 

    corecore