6 research outputs found

    Stream segregation and pattern matching techniques for polyphonic music databases.

    Get PDF
    Szeto, Wai Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 81-86).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations and Aims --- p.1Chapter 1.2 --- Thesis Organization --- p.6Chapter 2 --- Preliminaries --- p.7Chapter 2.1 --- Fundamentals of Music and Terminology --- p.7Chapter 2.2 --- Findings in Auditory Psychology --- p.8Chapter 3 --- Literature Review --- p.12Chapter 3.1 --- Pattern Matching Techniques for Music Information Retrieval --- p.12Chapter 3.2 --- Stream Segregation --- p.14Chapter 3.3 --- Post-tonal Music Analysis --- p.15Chapter 4 --- Proposed Method for Stream Segregation --- p.17Chapter 4.1 --- Music Representation --- p.17Chapter 4.2 --- Proposed Method --- p.19Chapter 4.3 --- Application of Stream Segregation to Polyphonic Databases --- p.27Chapter 4.4 --- Experimental Results --- p.30Chapter 4.5 --- Summary --- p.36Chapter 5 --- Proposed Approaches for Post-tonal Music Analysis --- p.38Chapter 5.1 --- Pitch-Class Set Theory --- p.39Chapter 5.2 --- Sequence-Based Approach --- p.43Chapter 5.2.1 --- Music Representation --- p.43Chapter 5.2.2 --- Matching Conditions --- p.44Chapter 5.2.3 --- Algorithm --- p.46Chapter 5.3 --- Graph-Based Approach --- p.47Chapter 5.3.1 --- Graph Theory and Its Notations --- p.48Chapter 5.3.2 --- Music Representation --- p.50Chapter 5.3.3 --- Matching Conditions --- p.53Chapter 5.3.4 --- Algorithm --- p.57Chapter 5.4 --- Experiments --- p.67Chapter 5.4.1 --- Experiment 1 --- p.67Chapter 5.4.2 --- Experiment 2 --- p.68Chapter 5.4.3 --- Experiment 3 --- p.70Chapter 5.4.4 --- Experiment 4 --- p.75Chapter 6 --- Conclusion --- p.79Bibliography --- p.81A Publications --- p.8

    Ein physiologisch gehörgerechtes Verfahren zur automatisierten Melodietranskription

    Get PDF
    Abstract The implementation of a method for automatic transcription of music is proposed in this thesis. The human ability of musical perception, and espescially tasks performed by skilled professional musicians, can by far not be duplicated by recent technical systems. It can therefore be considered a plausible approach to make use of perceptually motivated strategies as far as possible in order to bridge this gap for systems for music analysis and understanding. In the presented work the basic processing mechanisms used by the mammalian auditory periphery, as well as high-level cognitive processes are applied to the analysis of musical input. A detailed summary describes state-of-the-art algorithms for detection of fundamental frequencies as well as segmentation of musical phrases. Current systems for monophonic and polyphonic melody transcription are introduced. The fundamental physiological components of the auditory periphery and principles based on Gestalt psychology are illustrated. Furthermore the models used in this thesis, including active sound preprocessing of the inner ear, are described. In order to take account of auditive postprocessing, principles of pitch perception and a hierarchical model based on assumptions from Gestalt psychology are utilized. Besides the development of the hierarchical model the core of the presented thesis consists of the implementation of the methods regarding monophonic and polyphonic transcription strategies. Aurally justified pitch extraction, psychoacoustically motivated segmentation and postprocessing based on music theory constitute the basis for monophonic transcription. The polyphonic parts, as partial interference, pitch hypothesis or octave detection, are thought of setting up the fundamentals of subsequent implementations. The thesis concludes with the evaluation of the proposed system. A variety of different test series in the context of a metadata search engine are described. The results show the potential of the method regarding (commercial) applications.: Das Thema dieser Dissertation ist die Implementierung eines Verfahrens zur automatisierten Transkription von Musik. Die Fähigkeit des Menschen, insbesondere die von musikalischen Experten, bezüglich der Wahrnehmung musikalischer Inhalte kann von aktuellen technischen Systemen bei weitem nicht reproduziert werden. Einen plausiblen Ansatz, um diese Lücke für Anwendungen der automatisierten Musikanalyse zu schliessen, stellt die Verwendung perzeptuell motivierter Strategien dar. Die vorliegende Arbeit wendet daher konsequent grundlegende Verarbeitungsmechanismen der menschlichen auditorischen Peripherie sowie kognitiv höher angesiedelter Gehirnzentren an. In einer ausfürlichen Darstellung des Standes der Technik werden die aktuellen Algorithmen zur Bestimmung der Grundfrequenzen und zur Segmentierung musikalischer Phrasen sowie deren Anwendung in monophonen und polyphonen Melodietranskriptionssystemen erläutert. Nach der Beschreibung der fundamentalen physiologischen Komponenten der auditorischen Peripherie und Prinzipien der Gestaltpsychologie werden die in dieser Arbeit verwendeten Modelle der teilweise aktiven Schallvorverarbeitung des Innenohres erläutert. Im Bereich der auditiven Weiterverarbeitung werden Prozesse der Frequenzwahrnehmung sowie ein auf gestaltbasierenden Annahmen aufgebautes eigenes Hierarchiemodell eingeführt. Neben der Aufstellung dieses Hierarchiemodells besteht der Kernpunkt der Arbeit in der Implementierung der ausgewählten Modelle bezüglich monophoner und polyphoner Transkriptionsstrategien. Gehörgerechte Pitchextraktion, psychoakustisch motivierte Segmentierung und musiktheoretisch untermauerte Nachbearbeitung bilden die Basis einstimmiger Analyse. Die Untersuchung von Partialtoninterferenzen, polyphonen Pitchhypothesen und Ansätzen zur Oktaverkennung sollen als Grundlage weiterführender Arbeiten im mehrstimmigen Anwendungsfall aufgefasst werden. Die Arbeit schliesst mit der Evaluierung des Verfahrens anhand der Diskussion einer Anzahl verschiedener Testreihen im Umfeld eines Metadaten-Suchsystems. Die erhaltenen Ergebnisse verdeutlichen das (auch kommerzielle) Anwendungspotential der vorgestellten Methode

    Content-based music structure analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Tune Retrieval in the Multimedia Library

    Get PDF
    Musical scores are traditionally retrieved by title, composer or subject classification. Just as multimedia computer systems increase the range of opportunities available for presenting musical information, so they also offer new ways of posing musically-oriented queries. This paper shows how scores can be retrieved from a database on the basis of a few notes sung or hummed into a microphone. The design of such a facility raises several interesting issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input and evaluate its performance

    Automatic mood detection from electronic music data

    No full text
    Automatic mood detection from music has two main benefits. Firstly, having the knowledge of mood in advance can allow for possible enhancement of the music experience (such as mood-based visualizations) and secondly it makes 'query by mood' from music data-banks possible. This research is concerned with the automatic detection of mood from the electronic music genre, in particular that of drum and bass. The methodology was relatively simple, firstly sampling the music, and then giving a human pre-classification to the music (to use for training a classifier) via a point on a Thayer's model mood map. The samples then had low level signal processing features, mel frequency cepstral coefficient, psychoacoustic features and pitch image summary features extracted from them. These were then verified as useful via self organising maps and ranking via the feature selection techniques of information gain, gain ratio and symmetric uncertainty. The verified features were then used as training and testing (via cross-validation) data for a 3 layer perceptron neural network. Two approaches at feature extraction were used due to the first approach performing poorly at self organising map based cluster analysis. The mood classification scheme was then simplified to have four moods as opposed to 25. The main difference, however between the two approaches was based around different feature extraction window duration and different features. The second approach's features were used to train the neural network and the classification performed with classification accuracy rates no less than 84 %. Out of this research comes understanding of how one human's approximated perception can be captured and shows its use for determination of mood classifications from music.UnpublishedAhrendt, P., Meng, A., Larsen, J. "Decision time horizon for music genre classification using short time features". Submitted for EUSIPCO, 2004. Bishop, C. M. "Neural Networks for Pattern Recognition". Oxford University Press. http://www.ncrq.aston.ac.uk/NNPR/ , 1995. Cheng, K., Nazer, B., Uppuluri, J., Verret, R. "Beat This: A Beat Synchronization Project'', Owlnet Group, Rice University, (Retrieved on 7 May 2004 from http://www.owlnet.rice.edu/-elec301/Proiects01/beat_sync/index.html ), 2003. Dahlhaus, C., Gjerdingen, C., Robert 0., "Studies in the Origin of Harmonic Tonality", Princeton University Press, ISBN 0691091358, 1990. [5] Demuth H. and Beale M. "Neural Network Toolbox for use with Matlab Documentation", MathWorks, http://www.mathworks.com , 1998. Deva, B. C. "Psychoacoustics of Music and Speech". I. M. H. Press Ltd, 1967. Golub, S. "Classifying recorded music". Unpublished masters thesis. University of Edinburgh. (Retrieved May 28 2004 from http://www.aigeek.com/aimsc/ ), 2000. Grimaldi, M., Cunningham, P., Kokaram, A. "An Evaluation of Alternative Feature Selection Strategies and Ensemble Techniques for Classifying Music", to appear in Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, September, 2003. Haykin, S. "Neural networks: a comprehensive foundation". Upper Saddle River, N.J. , Prentice Hall, 1999. Healey, J., Paccar, R., and Dabek, F. new affect-perceiving inter-face and its application to personalized music selection". Technical Report 478, Massachusetts Institute of Technology, Media Laboratory Perceptual 65 Computing Section. http://www-white.media.mit.edu/tech-reports/TR-478/TR-478.html , 1998. Huron, D. and Aarden, B. "Cognitive Issues and Approaches in Music Information Retrieval". edited by S. Downie and D. Byrd (Retrieved on 7 May 2004 from http ://www.music-cod.ohio-state.edu/Huron/Publications/huron.MIR.conference.html ) , 2002. Juslin, P.N. "Cue Utilization in communication of emotion in music performance: Relating performance to perception", Experimental Psychology, 26, pp. 1797-1813, 2000. Kohonen, T. "Self-Organising Maps. Second Edition", Springer, 50 p, 2001. Krumhansl, C. L. Cognitive Foundations of Musical Pitch, Oxford Psychology Series 17, Oxford University Press, New York and Oxford, 1990. Larsen, J. "Introduction to Artificial Neural Networks" IMM, 1999. Leman, M., Lesaffre, M., Tanghe, K. "An introduction to the IPEM Toolbox for Perception Based Music Analysis", Mikropolyphonie - The Online Contemporary Music Journal, Volume 7, 2001. Lei Yu, Huan Liu, Efficiently handling feature redundancy in high-dimensional data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, Washington, D.C. 2003. Liu, D., Lu, L. and Zhang, H.J. "Automatic mood detection from acoustic music data", International Symposium on Music Information Retrieval, Baltimore, Maryland (USA), 2003. Logan, B. "Mel Frequency Cepstral Coefficients for Music Modeling" in Proc. of the International Symposium on Music Information Retrieval 2000, Plymouth, USA, Oct, 2000. Lyons, A. "Synaesthesia - A Cognitive Model of Cross Modal Association." Consciousness, Literature and the Arts 2, 2001. McKinney, M.F. and Breebaart, J. "Features for Audio and Music Classification," in 4th International Conference on Music Information, http://ismir2003.ismir.net/papers/McKinney.PDF , 2003. Metois, E. "Musical Sound Information: Musical Gestures and Embedding Systems", PhD Thesis, MIT Media Lab, 1996. Nyquist, H. "Certain topics in telegraph transmission theory," Trans. AIEE, vol. 47, pp. 617-644, Apr, 1928. Pampalk, E. Rauber, A. Merkl, D. "Content-based organization and visualization of music archives". ACM Multimedia pp: 570-579, 2002. McNabb, R., Smith, L., Witten, I., Henderson, C. "Tune Retrieval in the Multimedia Library," Multimedia Tools and Applications, 10(2-3)113-132, 2000. Schmidt, A. and Stone, T. "Music Classification and Identification System". University of Colorado (Retrieved on 7 May 2004 from http://www.flwvd.dhs.orq/school/MusicRecoqnitionDatabase.pdf Schubert, E., Wolfe, J. and Tarnopolsky, A. "Spectral centroid and timbre in complex, multiple instrumental textures" International Conference on Music Perception and Cognition, North Western University, Illinois. 654-657, 2004. Scott, P. and Widrow, B. "Music Classification using Neural Networks", Stanford University (Retrieved on 7 May 2004 from http://www.stanford.edu/class/ee373a/musicclassification.pdf) Shannon, C., "Communication in the presence of noise," Proc. Institute of Radio Engineers, vol. 37, no.1, pp. 10-21, Jan, 1949. Slaney, M. "Auditory toolbox (Tech. Rep. No. 1998-010)". Interval Research Corporation. (Retrieved 28 May 2004 from http://rvl4.ecn.purdue.edu/-malcolm/interval/1998-010/ ) , 1998. Sondhi, M.M., "New Methods of Pitch Extraction". IEEE Trans. Audio and Electroacoustics, Vol. AU-16, No.2, pp.262-266, June, 1968. T. Li and M. Ogihara and Q. Li, "A comparative study on content-based music genre classification," in Proc. ACM SIGIR '03, Toronto, Canada, July, pp. 282-289, 2003. Thayer, R.E. The Biopsychology of Mood and Arousal. New York: Oxford University Press, 1989. Tzanetakis, G, and Cook, P, "Musical Genre Classification of Audio Signals". IEEE Transactions on Speech and Audio Processing, 10(5): 293-302, 2002. Wessel, D. "Timbre Space as a Musical Control Structure", Foundations of Computer Music", Curtis Roads Eds. MIT Press p.640-657, 1997. Witten, I., Frank, E., Kaufmann, M. "Data Mining: Practical machine learning tools with Java implementations", Morgan Kaufmann, San Francisco, 2000. Yazhong Feng, Yueting Zhuang, Yunhe Pan. Music Information Retrieval by Detecting Mood via Computational Media Aesthetics. Web Intelligence, pg. 235-24, 2003. Zillman, D. "Mood management in the context of selective exposure theory", In M. E. (Ed), Communication yearbook 23, Thousand Oaks, CA: Sage, pp. 103-123, 2000

    Automatic mood detection from electronic music data

    No full text
    Automatic mood detection from music has two main benefits. Firstly, having the knowledge of mood in advance can allow for possible enhancement of the music experience (such as mood-based visualizations) and secondly it makes 'query by mood' from music data-banks possible. This research is concerned with the automatic detection of mood from the electronic music genre, in particular that of drum and bass. The methodology was relatively simple, firstly sampling the music, and then giving a human pre-classification to the music (to use for training a classifier) via a point on a Thayer's model mood map. The samples then had low level signal processing features, mel frequency cepstral coefficient, psychoacoustic features and pitch image summary features extracted from them. These were then verified as useful via self organising maps and ranking via the feature selection techniques of information gain, gain ratio and symmetric uncertainty. The verified features were then used as training and testing (via cross-validation) data for a 3 layer perceptron neural network. Two approaches at feature extraction were used due to the first approach performing poorly at self organising map based cluster analysis. The mood classification scheme was then simplified to have four moods as opposed to 25. The main difference, however between the two approaches was based around different feature extraction window duration and different features. The second approach's features were used to train the neural network and the classification performed with classification accuracy rates no less than 84 %. Out of this research comes understanding of how one human's approximated perception can be captured and shows its use for determination of mood classifications from music.UnpublishedAhrendt, P., Meng, A., Larsen, J. "Decision time horizon for music genre classification using short time features". Submitted for EUSIPCO, 2004. Bishop, C. M. "Neural Networks for Pattern Recognition". Oxford University Press. http://www.ncrq.aston.ac.uk/NNPR/ , 1995. Cheng, K., Nazer, B., Uppuluri, J., Verret, R. "Beat This: A Beat Synchronization Project'', Owlnet Group, Rice University, (Retrieved on 7 May 2004 from http://www.owlnet.rice.edu/-elec301/Proiects01/beat_sync/index.html ), 2003. Dahlhaus, C., Gjerdingen, C., Robert 0., "Studies in the Origin of Harmonic Tonality", Princeton University Press, ISBN 0691091358, 1990. [5] Demuth H. and Beale M. "Neural Network Toolbox for use with Matlab Documentation", MathWorks, http://www.mathworks.com , 1998. Deva, B. C. "Psychoacoustics of Music and Speech". I. M. H. Press Ltd, 1967. Golub, S. "Classifying recorded music". Unpublished masters thesis. University of Edinburgh. (Retrieved May 28 2004 from http://www.aigeek.com/aimsc/ ), 2000. Grimaldi, M., Cunningham, P., Kokaram, A. "An Evaluation of Alternative Feature Selection Strategies and Ensemble Techniques for Classifying Music", to appear in Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia, September, 2003. Haykin, S. "Neural networks: a comprehensive foundation". Upper Saddle River, N.J. , Prentice Hall, 1999. Healey, J., Paccar, R., and Dabek, F. new affect-perceiving inter-face and its application to personalized music selection". Technical Report 478, Massachusetts Institute of Technology, Media Laboratory Perceptual 65 Computing Section. http://www-white.media.mit.edu/tech-reports/TR-478/TR-478.html , 1998. Huron, D. and Aarden, B. "Cognitive Issues and Approaches in Music Information Retrieval". edited by S. Downie and D. Byrd (Retrieved on 7 May 2004 from http ://www.music-cod.ohio-state.edu/Huron/Publications/huron.MIR.conference.html ) , 2002. Juslin, P.N. "Cue Utilization in communication of emotion in music performance: Relating performance to perception", Experimental Psychology, 26, pp. 1797-1813, 2000. Kohonen, T. "Self-Organising Maps. Second Edition", Springer, 50 p, 2001. Krumhansl, C. L. Cognitive Foundations of Musical Pitch, Oxford Psychology Series 17, Oxford University Press, New York and Oxford, 1990. Larsen, J. "Introduction to Artificial Neural Networks" IMM, 1999. Leman, M., Lesaffre, M., Tanghe, K. "An introduction to the IPEM Toolbox for Perception Based Music Analysis", Mikropolyphonie - The Online Contemporary Music Journal, Volume 7, 2001. Lei Yu, Huan Liu, Efficiently handling feature redundancy in high-dimensional data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, Washington, D.C. 2003. Liu, D., Lu, L. and Zhang, H.J. "Automatic mood detection from acoustic music data", International Symposium on Music Information Retrieval, Baltimore, Maryland (USA), 2003. Logan, B. "Mel Frequency Cepstral Coefficients for Music Modeling" in Proc. of the International Symposium on Music Information Retrieval 2000, Plymouth, USA, Oct, 2000. Lyons, A. "Synaesthesia - A Cognitive Model of Cross Modal Association." Consciousness, Literature and the Arts 2, 2001. McKinney, M.F. and Breebaart, J. "Features for Audio and Music Classification," in 4th International Conference on Music Information, http://ismir2003.ismir.net/papers/McKinney.PDF , 2003. Metois, E. "Musical Sound Information: Musical Gestures and Embedding Systems", PhD Thesis, MIT Media Lab, 1996. Nyquist, H. "Certain topics in telegraph transmission theory," Trans. AIEE, vol. 47, pp. 617-644, Apr, 1928. Pampalk, E. Rauber, A. Merkl, D. "Content-based organization and visualization of music archives". ACM Multimedia pp: 570-579, 2002. McNabb, R., Smith, L., Witten, I., Henderson, C. "Tune Retrieval in the Multimedia Library," Multimedia Tools and Applications, 10(2-3)113-132, 2000. Schmidt, A. and Stone, T. "Music Classification and Identification System". University of Colorado (Retrieved on 7 May 2004 from http://www.flwvd.dhs.orq/school/MusicRecoqnitionDatabase.pdf Schubert, E., Wolfe, J. and Tarnopolsky, A. "Spectral centroid and timbre in complex, multiple instrumental textures" International Conference on Music Perception and Cognition, North Western University, Illinois. 654-657, 2004. Scott, P. and Widrow, B. "Music Classification using Neural Networks", Stanford University (Retrieved on 7 May 2004 from http://www.stanford.edu/class/ee373a/musicclassification.pdf) Shannon, C., "Communication in the presence of noise," Proc. Institute of Radio Engineers, vol. 37, no.1, pp. 10-21, Jan, 1949. Slaney, M. "Auditory toolbox (Tech. Rep. No. 1998-010)". Interval Research Corporation. (Retrieved 28 May 2004 from http://rvl4.ecn.purdue.edu/-malcolm/interval/1998-010/ ) , 1998. Sondhi, M.M., "New Methods of Pitch Extraction". IEEE Trans. Audio and Electroacoustics, Vol. AU-16, No.2, pp.262-266, June, 1968. T. Li and M. Ogihara and Q. Li, "A comparative study on content-based music genre classification," in Proc. ACM SIGIR '03, Toronto, Canada, July, pp. 282-289, 2003. Thayer, R.E. The Biopsychology of Mood and Arousal. New York: Oxford University Press, 1989. Tzanetakis, G, and Cook, P, "Musical Genre Classification of Audio Signals". IEEE Transactions on Speech and Audio Processing, 10(5): 293-302, 2002. Wessel, D. "Timbre Space as a Musical Control Structure", Foundations of Computer Music", Curtis Roads Eds. MIT Press p.640-657, 1997. Witten, I., Frank, E., Kaufmann, M. "Data Mining: Practical machine learning tools with Java implementations", Morgan Kaufmann, San Francisco, 2000. Yazhong Feng, Yueting Zhuang, Yunhe Pan. Music Information Retrieval by Detecting Mood via Computational Media Aesthetics. Web Intelligence, pg. 235-24, 2003. Zillman, D. "Mood management in the context of selective exposure theory", In M. E. (Ed), Communication yearbook 23, Thousand Oaks, CA: Sage, pp. 103-123, 2000
    corecore