6 research outputs found
Stream segregation and pattern matching techniques for polyphonic music databases.
Szeto, Wai Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 81-86).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations and Aims --- p.1Chapter 1.2 --- Thesis Organization --- p.6Chapter 2 --- Preliminaries --- p.7Chapter 2.1 --- Fundamentals of Music and Terminology --- p.7Chapter 2.2 --- Findings in Auditory Psychology --- p.8Chapter 3 --- Literature Review --- p.12Chapter 3.1 --- Pattern Matching Techniques for Music Information Retrieval --- p.12Chapter 3.2 --- Stream Segregation --- p.14Chapter 3.3 --- Post-tonal Music Analysis --- p.15Chapter 4 --- Proposed Method for Stream Segregation --- p.17Chapter 4.1 --- Music Representation --- p.17Chapter 4.2 --- Proposed Method --- p.19Chapter 4.3 --- Application of Stream Segregation to Polyphonic Databases --- p.27Chapter 4.4 --- Experimental Results --- p.30Chapter 4.5 --- Summary --- p.36Chapter 5 --- Proposed Approaches for Post-tonal Music Analysis --- p.38Chapter 5.1 --- Pitch-Class Set Theory --- p.39Chapter 5.2 --- Sequence-Based Approach --- p.43Chapter 5.2.1 --- Music Representation --- p.43Chapter 5.2.2 --- Matching Conditions --- p.44Chapter 5.2.3 --- Algorithm --- p.46Chapter 5.3 --- Graph-Based Approach --- p.47Chapter 5.3.1 --- Graph Theory and Its Notations --- p.48Chapter 5.3.2 --- Music Representation --- p.50Chapter 5.3.3 --- Matching Conditions --- p.53Chapter 5.3.4 --- Algorithm --- p.57Chapter 5.4 --- Experiments --- p.67Chapter 5.4.1 --- Experiment 1 --- p.67Chapter 5.4.2 --- Experiment 2 --- p.68Chapter 5.4.3 --- Experiment 3 --- p.70Chapter 5.4.4 --- Experiment 4 --- p.75Chapter 6 --- Conclusion --- p.79Bibliography --- p.81A Publications --- p.8
Ein physiologisch gehörgerechtes Verfahren zur automatisierten Melodietranskription
Abstract
The implementation of a method for automatic transcription of music is proposed
in this thesis. The human ability of musical perception, and espescially tasks
performed by skilled professional musicians, can by far not be duplicated by
recent technical systems. It can therefore be considered a plausible approach to
make use of perceptually motivated strategies as far as possible in order to
bridge this gap for systems for music analysis and understanding. In the
presented work the basic processing mechanisms used by the mammalian auditory
periphery, as well as high-level cognitive processes are applied to the analysis
of musical input.
A detailed summary describes state-of-the-art algorithms for detection of
fundamental frequencies as well as segmentation of musical phrases. Current
systems for monophonic and polyphonic melody transcription are introduced.
The fundamental physiological components of the auditory periphery and
principles based on Gestalt psychology are illustrated. Furthermore the models
used in this thesis, including active sound preprocessing of the inner ear, are
described. In order to take account of auditive
postprocessing, principles of pitch perception and a hierarchical
model based on assumptions from Gestalt psychology are utilized.
Besides the development of the hierarchical model the core of the presented
thesis consists of the implementation of the methods regarding monophonic and
polyphonic transcription strategies. Aurally justified pitch extraction,
psychoacoustically motivated segmentation and postprocessing based on music
theory constitute the basis for monophonic transcription. The polyphonic parts,
as partial interference, pitch hypothesis or octave detection, are thought of
setting up the fundamentals of subsequent implementations. The thesis concludes
with the evaluation of the proposed system. A variety of different test series
in the context of a metadata
search engine are described. The results show the potential of the method
regarding (commercial) applications.:
Das Thema dieser Dissertation ist die Implementierung eines Verfahrens zur automatisierten Transkription von Musik. Die Fähigkeit des Menschen, insbesondere die von musikalischen Experten, bezüglich der Wahrnehmung musikalischer Inhalte kann von aktuellen technischen Systemen bei weitem nicht
reproduziert werden. Einen plausiblen Ansatz, um diese Lücke für Anwendungen der automatisierten Musikanalyse zu schliessen, stellt die Verwendung perzeptuell motivierter Strategien dar. Die vorliegende Arbeit wendet daher konsequent grundlegende Verarbeitungsmechanismen der menschlichen auditorischen Peripherie sowie kognitiv höher angesiedelter Gehirnzentren an.
In einer ausfürlichen Darstellung des Standes der Technik werden die aktuellen Algorithmen zur Bestimmung der Grundfrequenzen und zur Segmentierung musikalischer Phrasen sowie deren Anwendung in monophonen und polyphonen Melodietranskriptionssystemen erläutert.
Nach der Beschreibung der fundamentalen physiologischen Komponenten der auditorischen Peripherie und Prinzipien der Gestaltpsychologie werden die in dieser Arbeit verwendeten Modelle der teilweise aktiven Schallvorverarbeitung
des Innenohres erläutert. Im Bereich der auditiven Weiterverarbeitung werden Prozesse der Frequenzwahrnehmung sowie ein auf gestaltbasierenden Annahmen aufgebautes eigenes Hierarchiemodell eingeführt.
Neben der Aufstellung dieses Hierarchiemodells besteht der Kernpunkt der Arbeit in der Implementierung der ausgewählten Modelle bezüglich monophoner und polyphoner Transkriptionsstrategien. Gehörgerechte Pitchextraktion, psychoakustisch motivierte Segmentierung und musiktheoretisch
untermauerte Nachbearbeitung bilden die Basis einstimmiger Analyse. Die Untersuchung von Partialtoninterferenzen, polyphonen Pitchhypothesen und Ansätzen zur Oktaverkennung sollen als Grundlage weiterführender Arbeiten im mehrstimmigen Anwendungsfall aufgefasst werden.
Die Arbeit schliesst mit der Evaluierung des Verfahrens anhand der Diskussion einer Anzahl verschiedener Testreihen im Umfeld eines Metadaten-Suchsystems. Die erhaltenen Ergebnisse verdeutlichen das (auch kommerzielle) Anwendungspotential der vorgestellten Methode
Tune Retrieval in the Multimedia Library
Musical scores are traditionally retrieved by title, composer or subject classification. Just as multimedia computer systems increase the range of opportunities available for presenting musical information, so they also offer new ways of posing musically-oriented queries. This paper shows how scores can be retrieved from a database on the basis of a few notes sung or hummed into a microphone. The design of such a facility raises several interesting issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input and evaluate its performance
Automatic mood detection from electronic music data
Automatic mood detection from music has two main benefits. Firstly, having the knowledge of mood in advance can allow for possible enhancement of the music experience (such as mood-based visualizations) and secondly it makes 'query by mood' from music data-banks possible. This research is concerned with the automatic detection of mood from the electronic music genre, in particular that of drum and bass. The methodology was relatively simple, firstly sampling the music, and then giving a human pre-classification to the music (to use for training a classifier) via a point on a Thayer's model mood map. The samples then had low level signal processing features, mel frequency cepstral coefficient, psychoacoustic features and pitch image summary features extracted from them. These were then verified as useful via self organising maps and ranking via the feature selection techniques of information gain, gain ratio and symmetric uncertainty. The verified features were then used as training and testing (via cross-validation) data for a 3 layer perceptron neural network.
Two approaches at feature extraction were used due to the first approach performing poorly at self organising map based cluster analysis. The mood classification scheme was then simplified to have four moods as opposed to 25. The main difference, however between the two approaches was based around different feature extraction window duration and different features. The second approach's features were used to train the neural network and the classification performed with classification accuracy rates no less than 84 %.
Out of this research comes understanding of how one human's approximated perception can be captured and shows its use for determination of mood classifications from music.UnpublishedAhrendt, P., Meng, A., Larsen, J. "Decision time horizon for music genre classification
using short time features". Submitted for EUSIPCO, 2004.
Bishop, C. M. "Neural Networks for Pattern Recognition". Oxford University Press.
http://www.ncrq.aston.ac.uk/NNPR/ , 1995.
Cheng, K., Nazer, B., Uppuluri, J., Verret, R. "Beat This: A Beat Synchronization
Project'', Owlnet Group, Rice University, (Retrieved on 7 May 2004 from
http://www.owlnet.rice.edu/-elec301/Proiects01/beat_sync/index.html ), 2003.
Dahlhaus, C., Gjerdingen, C., Robert 0., "Studies in the Origin of Harmonic Tonality",
Princeton University Press, ISBN 0691091358, 1990.
[5] Demuth H. and Beale M. "Neural Network Toolbox for use with Matlab
Documentation", MathWorks, http://www.mathworks.com , 1998.
Deva, B. C. "Psychoacoustics of Music and Speech". I. M. H. Press Ltd, 1967.
Golub, S. "Classifying recorded music". Unpublished masters thesis. University of
Edinburgh. (Retrieved May 28 2004 from http://www.aigeek.com/aimsc/ ), 2000.
Grimaldi, M., Cunningham, P., Kokaram, A. "An Evaluation of Alternative Feature
Selection Strategies and Ensemble Techniques for Classifying Music", to appear in
Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia,
September, 2003.
Haykin, S. "Neural networks: a comprehensive foundation". Upper Saddle River, N.J.
, Prentice Hall, 1999.
Healey, J., Paccar, R., and Dabek, F. new affect-perceiving inter-face and its
application to personalized music selection". Technical Report 478, Massachusetts
Institute of Technology, Media Laboratory Perceptual 65 Computing Section.
http://www-white.media.mit.edu/tech-reports/TR-478/TR-478.html , 1998.
Huron, D. and Aarden, B. "Cognitive Issues and Approaches in Music Information
Retrieval". edited by S. Downie and D. Byrd (Retrieved on 7 May 2004 from
http ://www.music-cod.ohio-state.edu/Huron/Publications/huron.MIR.conference.html )
, 2002.
Juslin, P.N. "Cue Utilization in communication of emotion in music performance:
Relating performance to perception", Experimental Psychology, 26, pp. 1797-1813,
2000.
Kohonen, T. "Self-Organising Maps. Second Edition", Springer, 50 p, 2001.
Krumhansl, C. L. Cognitive Foundations of Musical Pitch, Oxford Psychology Series
17, Oxford University Press, New York and Oxford, 1990.
Larsen, J. "Introduction to Artificial Neural Networks" IMM, 1999.
Leman, M., Lesaffre, M., Tanghe, K. "An introduction to the IPEM Toolbox for
Perception Based Music Analysis", Mikropolyphonie - The Online Contemporary
Music Journal, Volume 7, 2001.
Lei Yu, Huan Liu, Efficiently handling feature redundancy in high-dimensional data.
Proceedings of the ninth ACM SIGKDD international conference on Knowledge
discovery and data mining, August 24-27, Washington, D.C. 2003.
Liu, D., Lu, L. and Zhang, H.J. "Automatic mood detection from acoustic music data",
International Symposium on Music Information Retrieval, Baltimore, Maryland (USA),
2003.
Logan, B. "Mel Frequency Cepstral Coefficients for Music Modeling" in Proc. of the
International Symposium on Music Information Retrieval 2000, Plymouth, USA, Oct,
2000.
Lyons, A. "Synaesthesia - A Cognitive Model of Cross Modal Association."
Consciousness, Literature and the Arts 2, 2001.
McKinney, M.F. and Breebaart, J. "Features for Audio and Music Classification," in
4th International Conference on Music Information,
http://ismir2003.ismir.net/papers/McKinney.PDF , 2003.
Metois, E. "Musical Sound Information: Musical Gestures and Embedding Systems",
PhD Thesis, MIT Media Lab, 1996.
Nyquist, H. "Certain topics in telegraph transmission theory," Trans. AIEE, vol. 47, pp.
617-644, Apr, 1928.
Pampalk, E. Rauber, A. Merkl, D. "Content-based organization and visualization of
music archives". ACM Multimedia pp: 570-579, 2002.
McNabb, R., Smith, L., Witten, I., Henderson, C. "Tune Retrieval in the Multimedia
Library," Multimedia Tools and Applications, 10(2-3)113-132, 2000.
Schmidt, A. and Stone, T. "Music Classification and Identification System". University
of Colorado (Retrieved on 7 May 2004 from
http://www.flwvd.dhs.orq/school/MusicRecoqnitionDatabase.pdf
Schubert, E., Wolfe, J. and Tarnopolsky, A. "Spectral centroid and timbre in complex,
multiple instrumental textures" International Conference on Music Perception and
Cognition, North Western University, Illinois. 654-657, 2004.
Scott, P. and Widrow, B. "Music Classification using Neural Networks", Stanford
University (Retrieved on 7 May 2004 from
http://www.stanford.edu/class/ee373a/musicclassification.pdf)
Shannon, C., "Communication in the presence of noise," Proc. Institute of Radio
Engineers, vol. 37, no.1, pp. 10-21, Jan, 1949.
Slaney, M. "Auditory toolbox (Tech. Rep. No. 1998-010)". Interval Research
Corporation. (Retrieved 28 May 2004 from
http://rvl4.ecn.purdue.edu/-malcolm/interval/1998-010/ ) , 1998.
Sondhi, M.M., "New Methods of Pitch Extraction". IEEE Trans. Audio and
Electroacoustics, Vol. AU-16, No.2, pp.262-266, June, 1968.
T. Li and M. Ogihara and Q. Li, "A comparative study on content-based music genre
classification," in Proc. ACM SIGIR '03, Toronto, Canada, July, pp. 282-289, 2003.
Thayer, R.E. The Biopsychology of Mood and Arousal. New York: Oxford University
Press, 1989.
Tzanetakis, G, and Cook, P, "Musical Genre Classification of Audio Signals". IEEE
Transactions on Speech and Audio Processing, 10(5): 293-302, 2002.
Wessel, D. "Timbre Space as a Musical Control Structure", Foundations of Computer
Music", Curtis Roads Eds. MIT Press p.640-657, 1997.
Witten, I., Frank, E., Kaufmann, M. "Data Mining: Practical machine learning tools
with Java implementations", Morgan Kaufmann, San Francisco, 2000.
Yazhong Feng, Yueting Zhuang, Yunhe Pan. Music Information Retrieval by
Detecting Mood via Computational Media Aesthetics. Web Intelligence, pg. 235-24,
2003.
Zillman, D. "Mood management in the context of selective exposure theory", In M. E.
(Ed), Communication yearbook 23, Thousand Oaks, CA: Sage, pp. 103-123, 2000
Automatic mood detection from electronic music data
Automatic mood detection from music has two main benefits. Firstly, having the knowledge of mood in advance can allow for possible enhancement of the music experience (such as mood-based visualizations) and secondly it makes 'query by mood' from music data-banks possible. This research is concerned with the automatic detection of mood from the electronic music genre, in particular that of drum and bass. The methodology was relatively simple, firstly sampling the music, and then giving a human pre-classification to the music (to use for training a classifier) via a point on a Thayer's model mood map. The samples then had low level signal processing features, mel frequency cepstral coefficient, psychoacoustic features and pitch image summary features extracted from them. These were then verified as useful via self organising maps and ranking via the feature selection techniques of information gain, gain ratio and symmetric uncertainty. The verified features were then used as training and testing (via cross-validation) data for a 3 layer perceptron neural network.
Two approaches at feature extraction were used due to the first approach performing poorly at self organising map based cluster analysis. The mood classification scheme was then simplified to have four moods as opposed to 25. The main difference, however between the two approaches was based around different feature extraction window duration and different features. The second approach's features were used to train the neural network and the classification performed with classification accuracy rates no less than 84 %.
Out of this research comes understanding of how one human's approximated perception can be captured and shows its use for determination of mood classifications from music.UnpublishedAhrendt, P., Meng, A., Larsen, J. "Decision time horizon for music genre classification
using short time features". Submitted for EUSIPCO, 2004.
Bishop, C. M. "Neural Networks for Pattern Recognition". Oxford University Press.
http://www.ncrq.aston.ac.uk/NNPR/ , 1995.
Cheng, K., Nazer, B., Uppuluri, J., Verret, R. "Beat This: A Beat Synchronization
Project'', Owlnet Group, Rice University, (Retrieved on 7 May 2004 from
http://www.owlnet.rice.edu/-elec301/Proiects01/beat_sync/index.html ), 2003.
Dahlhaus, C., Gjerdingen, C., Robert 0., "Studies in the Origin of Harmonic Tonality",
Princeton University Press, ISBN 0691091358, 1990.
[5] Demuth H. and Beale M. "Neural Network Toolbox for use with Matlab
Documentation", MathWorks, http://www.mathworks.com , 1998.
Deva, B. C. "Psychoacoustics of Music and Speech". I. M. H. Press Ltd, 1967.
Golub, S. "Classifying recorded music". Unpublished masters thesis. University of
Edinburgh. (Retrieved May 28 2004 from http://www.aigeek.com/aimsc/ ), 2000.
Grimaldi, M., Cunningham, P., Kokaram, A. "An Evaluation of Alternative Feature
Selection Strategies and Ensemble Techniques for Classifying Music", to appear in
Workshop in Multimedia Discovery and Mining, ECML/PKDD03, Dubrovnik, Croatia,
September, 2003.
Haykin, S. "Neural networks: a comprehensive foundation". Upper Saddle River, N.J.
, Prentice Hall, 1999.
Healey, J., Paccar, R., and Dabek, F. new affect-perceiving inter-face and its
application to personalized music selection". Technical Report 478, Massachusetts
Institute of Technology, Media Laboratory Perceptual 65 Computing Section.
http://www-white.media.mit.edu/tech-reports/TR-478/TR-478.html , 1998.
Huron, D. and Aarden, B. "Cognitive Issues and Approaches in Music Information
Retrieval". edited by S. Downie and D. Byrd (Retrieved on 7 May 2004 from
http ://www.music-cod.ohio-state.edu/Huron/Publications/huron.MIR.conference.html )
, 2002.
Juslin, P.N. "Cue Utilization in communication of emotion in music performance:
Relating performance to perception", Experimental Psychology, 26, pp. 1797-1813,
2000.
Kohonen, T. "Self-Organising Maps. Second Edition", Springer, 50 p, 2001.
Krumhansl, C. L. Cognitive Foundations of Musical Pitch, Oxford Psychology Series
17, Oxford University Press, New York and Oxford, 1990.
Larsen, J. "Introduction to Artificial Neural Networks" IMM, 1999.
Leman, M., Lesaffre, M., Tanghe, K. "An introduction to the IPEM Toolbox for
Perception Based Music Analysis", Mikropolyphonie - The Online Contemporary
Music Journal, Volume 7, 2001.
Lei Yu, Huan Liu, Efficiently handling feature redundancy in high-dimensional data.
Proceedings of the ninth ACM SIGKDD international conference on Knowledge
discovery and data mining, August 24-27, Washington, D.C. 2003.
Liu, D., Lu, L. and Zhang, H.J. "Automatic mood detection from acoustic music data",
International Symposium on Music Information Retrieval, Baltimore, Maryland (USA),
2003.
Logan, B. "Mel Frequency Cepstral Coefficients for Music Modeling" in Proc. of the
International Symposium on Music Information Retrieval 2000, Plymouth, USA, Oct,
2000.
Lyons, A. "Synaesthesia - A Cognitive Model of Cross Modal Association."
Consciousness, Literature and the Arts 2, 2001.
McKinney, M.F. and Breebaart, J. "Features for Audio and Music Classification," in
4th International Conference on Music Information,
http://ismir2003.ismir.net/papers/McKinney.PDF , 2003.
Metois, E. "Musical Sound Information: Musical Gestures and Embedding Systems",
PhD Thesis, MIT Media Lab, 1996.
Nyquist, H. "Certain topics in telegraph transmission theory," Trans. AIEE, vol. 47, pp.
617-644, Apr, 1928.
Pampalk, E. Rauber, A. Merkl, D. "Content-based organization and visualization of
music archives". ACM Multimedia pp: 570-579, 2002.
McNabb, R., Smith, L., Witten, I., Henderson, C. "Tune Retrieval in the Multimedia
Library," Multimedia Tools and Applications, 10(2-3)113-132, 2000.
Schmidt, A. and Stone, T. "Music Classification and Identification System". University
of Colorado (Retrieved on 7 May 2004 from
http://www.flwvd.dhs.orq/school/MusicRecoqnitionDatabase.pdf
Schubert, E., Wolfe, J. and Tarnopolsky, A. "Spectral centroid and timbre in complex,
multiple instrumental textures" International Conference on Music Perception and
Cognition, North Western University, Illinois. 654-657, 2004.
Scott, P. and Widrow, B. "Music Classification using Neural Networks", Stanford
University (Retrieved on 7 May 2004 from
http://www.stanford.edu/class/ee373a/musicclassification.pdf)
Shannon, C., "Communication in the presence of noise," Proc. Institute of Radio
Engineers, vol. 37, no.1, pp. 10-21, Jan, 1949.
Slaney, M. "Auditory toolbox (Tech. Rep. No. 1998-010)". Interval Research
Corporation. (Retrieved 28 May 2004 from
http://rvl4.ecn.purdue.edu/-malcolm/interval/1998-010/ ) , 1998.
Sondhi, M.M., "New Methods of Pitch Extraction". IEEE Trans. Audio and
Electroacoustics, Vol. AU-16, No.2, pp.262-266, June, 1968.
T. Li and M. Ogihara and Q. Li, "A comparative study on content-based music genre
classification," in Proc. ACM SIGIR '03, Toronto, Canada, July, pp. 282-289, 2003.
Thayer, R.E. The Biopsychology of Mood and Arousal. New York: Oxford University
Press, 1989.
Tzanetakis, G, and Cook, P, "Musical Genre Classification of Audio Signals". IEEE
Transactions on Speech and Audio Processing, 10(5): 293-302, 2002.
Wessel, D. "Timbre Space as a Musical Control Structure", Foundations of Computer
Music", Curtis Roads Eds. MIT Press p.640-657, 1997.
Witten, I., Frank, E., Kaufmann, M. "Data Mining: Practical machine learning tools
with Java implementations", Morgan Kaufmann, San Francisco, 2000.
Yazhong Feng, Yueting Zhuang, Yunhe Pan. Music Information Retrieval by
Detecting Mood via Computational Media Aesthetics. Web Intelligence, pg. 235-24,
2003.
Zillman, D. "Mood management in the context of selective exposure theory", In M. E.
(Ed), Communication yearbook 23, Thousand Oaks, CA: Sage, pp. 103-123, 2000