292 research outputs found
Automatic classification of latin music : some experiments on musical genre classification
Estágio realizado no INESC PortoTese de mestrado integrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Automatic classification of latin music : some experiments on musical genre classification
Estágio realizado no INESC PortoTese de mestrado integrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Efficient Analysis in Multimedia Databases
The rapid progress of digital technology has led to a situation
where computers have become ubiquitous tools. Now we can find them
in almost every environment, be it industrial or even private. With
ever increasing performance computers assumed more and more vital
tasks in engineering, climate and environmental research, medicine
and the content industry. Previously, these tasks could only be
accomplished by spending enormous amounts of time and money. By
using digital sensor devices, like earth observation satellites,
genome sequencers or video cameras, the amount and complexity of
data with a spatial or temporal relation has gown enormously. This
has led to new challenges for the data analysis and requires the use
of modern multimedia databases.
This thesis aims at developing efficient techniques for the analysis
of complex multimedia objects such as CAD data, time series and
videos. It is assumed that the data is modeled by commonly used
representations. For example CAD data is represented as a set of
voxels, audio and video data is represented as multi-represented,
multi-dimensional time series.
The main part of this thesis focuses on finding efficient methods
for collision queries of complex spatial objects. One way to speed
up those queries is to employ a cost-based decompositioning,
which uses interval groups to approximate a spatial object. For
example, this technique can be used for the Digital Mock-Up (DMU)
process, which helps engineers to ensure short product cycles. This
thesis defines and discusses a new similarity measure for time
series called threshold-similarity. Two time series are
considered similar if they expose a similar behavior regarding the
transgression of a given threshold value. Another part of the thesis
is concerned with the efficient calculation of reverse
k-nearest neighbor (RkNN) queries in general metric spaces
using conservative and progressive approximations. The aim of such
RkNN queries is to determine the impact of single objects on the
whole database. At the end, the thesis deals with video
retrieval and hierarchical genre classification of music
using multiple representations. The practical relevance of the
discussed genre classification approach is highlighted with a
prototype tool that helps the user to organize large music
collections.
Both the efficiency and the effectiveness of the presented
techniques are thoroughly analyzed. The benefits over traditional
approaches are shown by evaluating the new methods on real-world
test datasets
Automatic music genre classification
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science. 2014.No abstract provided
Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis
Musiksignale bestehen in der Regel aus einer Überlagerung mehrerer
Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen
Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music
Information Retrieval (MIR) versuchen, semantische Information direkt aus
diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde
häufig beobachtet, dass die Leistungsfähigkeit dieser Algorithmen durch
die Signalüberlagerungen und den daraus resultierenden Informationsverlust
generell limitiert ist. Ein möglicher Lösungsansatz besteht darin,
mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der
Analyse klanglich zu isolieren. Die Leistungsfähigkeit dieser Algorithmen
ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine
sehr gute Trennung der Einzelquellen zu ermöglichen. In dieser Arbeit
werden daher ausschließlich isolierte Instrumentalaufnahmen untersucht,
die klanglich nicht von anderen Instrumenten überlagert sind. Exemplarisch
werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses
Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen
entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein
Algorithmus vorgestellt, der eine automatische Transkription von
Bassgitarrenaufnahmen durchführt. Dabei wird das Audiosignal durch
verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf
dem Instrument entsprechen. Neben den üblichen Notenparametern Anfang,
Dauer, Lautstärke und Tonhöhe werden dabei auch instrumentenspezifische
Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage
auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand
zweier neu erstellter Audiodatensätze belegen, dass der vorgestellte
Transkriptionsalgorithmus auf einem Datensatz von realistischen
Bassgitarrenaufnahmen eine höhere Erkennungsgenauigkeit erreichen kann als
drei existierende Algorithmen aus dem Stand der Technik. Die Schätzung der
instrumentenspezifischen Parameter kann insbesondere für isolierte
Einzelnoten mit einer hohen Güte durchgeführt werden.Im zweiten Teil der
Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich
wieder- holender Basslinien auf das Musikgenre geschlossen werden kann.
Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale,
rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ
beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen
Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene
Ansätze für die automatische Genreklassifikation verglichen. Dabei zeigte
sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur
Anhand der Analyse der Basslinie eines Musikstückes bereits eine mittlere
Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der
originalen Bassspuren basierend auf den extrahierten Notenparametern wird
im dritten Teil der Arbeit untersucht. Dabei wird ein neuer
Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des
Physical Modeling verschiedene Aspekte der für die Bassgitarre
charakteristische Klangerzeugung wie Saitenanregung, Dämpfung, Kollision
zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet.
Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es
erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen
Parameter zu übertragen um sie auf Dekoderseite wieder zu
resynthetisieren. Die Ergebnisse mehrerer Hötest belegen, dass der
vorgeschlagene Synthesealgorithmus eine Re- Synthese von
Bassgitarrenaufnahmen mit einer besseren Klangqualität ermöglicht als die
Übertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die
auf sehr geringe Bitraten ein gestellt sind.Music recordings most often consist of multiple instrument signals, which
overlap in time and frequency. In the field of Music Information Retrieval
(MIR), existing algorithms for the automatic transcription and analysis of
music recordings aim to extract semantic information from mixed audio
signals. In the last years, it was frequently observed that the algorithm
performance is limited due to the signal interference and the resulting
loss of information. One common approach to solve this problem is to first
apply source separation algorithms to isolate the present musical
instrument signals before analyzing them individually. The performance of
source separation algorithms strongly depends on the number of instruments
as well as on the amount of spectral overlap.In this thesis, isolated
instrumental tracks are analyzed in order to circumvent the challenges of
source separation. Instead, the focus is on the development of
instrument-centered signal processing algorithms for music transcription,
musical analysis, as well as sound synthesis. The electric bass guitar is
chosen as an example instrument. Its sound production principles are
closely investigated and considered in the algorithmic design.In the first
part of this thesis, an automatic music transcription algorithm for
electric bass guitar recordings will be presented. The audio signal is
interpreted as a sequence of sound events, which are described by various
parameters. In addition to the conventionally used score-level parameters
note onset, duration, loudness, and pitch, instrument-specific parameters
such as the applied instrument playing techniques and the geometric
position on the instrument fretboard will be extracted. Different
evaluation experiments confirmed that the proposed transcription algorithm
outperformed three state-of-the-art bass transcription algorithms for the
transcription of realistic bass guitar recordings. The estimation of the
instrument-level parameters works with high accuracy, in particular for
isolated note samples.In the second part of the thesis, it will be
investigated, whether the sole analysis of the bassline of a music piece
allows to automatically classify its music genre. Different score-based
audio features will be proposed that allow to quantify tonal, rhythmic, and
structural properties of basslines. Based on a novel data set of 520
bassline transcriptions from 13 different music genres, three approaches
for music genre classification were compared. A rule-based classification
system could achieve a mean class accuracy of 64.8 % by only taking
features into account that were extracted from the bassline of a music
piece.The re-synthesis of a bass guitar recordings using the previously
extracted note parameters will be studied in the third part of this thesis.
Based on the physical modeling of string instruments, a novel sound
synthesis algorithm tailored to the electric bass guitar will be presented.
The algorithm mimics different aspects of the instrument’s sound
production mechanism such as string excitement, string damping, string-fret
collision, and the influence of the electro-magnetic pickup. Furthermore, a
parametric audio coding approach will be discussed that allows to encode
and transmit bass guitar tracks with a significantly smaller bit rate than
conventional audio coding algorithms do. The results of different listening
tests confirmed that a higher perceptual quality can be achieved if the
original bass guitar recordings are encoded and re-synthesized using the
proposed parametric audio codec instead of being encoded using conventional
audio codecs at very low bit rate settings
Recommender systems in industrial contexts
This thesis consists of four parts: - An analysis of the core functions and
the prerequisites for recommender systems in an industrial context: we identify
four core functions for recommendation systems: Help do Decide, Help to
Compare, Help to Explore, Help to Discover. The implementation of these
functions has implications for the choices at the heart of algorithmic
recommender systems. - A state of the art, which deals with the main techniques
used in automated recommendation system: the two most commonly used algorithmic
methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization
methods are detailed. The state of the art presents also purely content-based
methods, hybridization techniques, and the classical performance metrics used
to evaluate the recommender systems. This state of the art then gives an
overview of several systems, both from academia and industry (Amazon, Google
...). - An analysis of the performances and implications of a recommendation
system developed during this thesis: this system, Reperio, is a hybrid
recommender engine using KNN methods. We study the performance of the KNN
methods, including the impact of similarity functions used. Then we study the
performance of the KNN method in critical uses cases in cold start situation. -
A methodology for analyzing the performance of recommender systems in
industrial context: this methodology assesses the added value of algorithmic
strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201
Speech Mode Classification using the Fusion of CNNs and LSTM Networks
Speech mode classification is an area that has not been as widely explored in the field of sound classification as others such as environmental sounds, music genre, and speaker identification. But what is speech mode? While mode is defined as the way or the manner in which something occurs or is expressed or done, speech mode is defined as the style in which the speech is delivered by a person.
There are some reports on speech mode classification using conventional methods, such as whispering and talking using a normal phonetic sound. However, to the best of our knowledge, deep learning-based methods have not been reported in the open literature for the aforementioned classification scenario. Specifically, in this work we assess the performance of image-based classification algorithms on this challenging speech mode classification problem, including the usage of pre-trained deep neural networks, namely AlexNet, ResNet18 and SqueezeNet. Thus, we compare the classification efficiency of a set of deep learning-based classifiers, while we also assess the impact of different 2D image representations (spectrograms, mel-spectrograms, and their image-based fusion) on classification accuracy. These representations are used as input to the networks after being generated from the original audio signals. Next, we compare the accuracy of the DL-based classifies to a set of machine learning (ML) ones that use as their inputs Mel-Frequency Cepstral Coefficients (MFCCs) features. Then, after determining the most efficient sampling rate for our classification problem (i.e. 32kHz), we study the performance of our proposed method of combining CNN with LSTM (Long Short-Term Memory) networks. For this purpose, we use the features extracted from the deep networks of the previous step. We conclude our study by evaluating the role of sampling rates on classification accuracy by generating two sets of 2D image representations – one with 32kHz and the other with 16kHz sampling. Experimental results show that after cross validation the accuracy of DL-based approaches is 15% higher than ML ones, with SqueezeNet yielding an accuracy of more than 91% at 32kHz, whether we use transfer learning, feature-level fusion or score-level fusion (92.5%). Our proposed method using LSTMs further increased that accuracy by more than 3%, resulting in an average accuracy of 95.7%
Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion
Ph.DDOCTOR OF PHILOSOPH
Supervised And Semi-supervised Learning Using Informative Feature Subspaces
Tez (Doktora) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2010Thesis (PhD) -- İstanbul Technical University, Institute of Science and Technology, 2010Web madenciliği, biyoinformatik ve konuşma tanıma gibi birçok farklı alanda çok yüksek miktarda etiketsiz veri ve farklı öznitelik uzayları bulunmaktadır. Birlikte öğrenme (Co-training) algoritması gibi yarı-eğitmenli algoritmalar etiketsiz verinin kullanımını amaçlamaktadır. Rastgele öznitelik alt uzayları (RAS) metodu farklı öznitelik alt uzaylarını kullanarak sınıflandırıcı eğitmeyi ve bu sınıflandırıcıları, topluluklarda birleştirmeyi amaçlamaktadır. Bu tez çalışmasında, sınıflandırıcı toplulukları için ilişkili öznitelik alt uzayları rastgele seçilerek; bilgi içeren ve çeşitliliği sağlanmış öznitelik alt uzaylarının oluşturulması sağlanmıştır. Oluşturulan sınıflandırıcı toplulukları, eğitmenli ve yarı-eğitmenli öğrenme için kullanılmıştır. Önerdiğimiz ilk yöntem, öznitelik alt uzaylarını karşılıklı bilgi miktarına bağlı ilişki değerlerini kullanarak seçmektedir. Bu yöntem Rel-RAS (eğitmenli) ve Rel-RASCO (yarı-eğitmenli) algoritmalarında kullanılmıştır. İkinci yöntem, ilişkili ve artık olmayan öznitelik alt uzaylarını seçmek için, mRMR (en düşük artıklık ve en yüksek ilişkili) öznitelik seçme algoritmasının değiştirilmiş şeklini kullanmaktadır. Bu yöntem mRMR-RAS (eğitmenli) ve mRMR-RASCO (yarı-eğitmenli) algoritmalarında kullanılmıştır. Önerilen yöntemlerin deneysel analizleri belirli sayıda veri kümesinde gerçekleştirilmiş ve mevcut yöntemlerle karşılaştırılmıştır. Aynı zamanda önerilen yöntemlerle oluşturulmuş sınıflandırıcı topluluklarının teorik analizleri; Kohavi Wolpert (KW) varyans, bilgi kuramı tabanlı düşük düzeyli çeşitlilik (LOD) ve bilgi kuramı sayısı (ITS) kullanılarak gerçekleştirilmiştir. LOD ve KW-varyansının davranışları arasında benzerlik bulunmuş ve topluluk sınıflandırma başarımının ITS ile açıklanabileceği görülmüştür.In many different fields, such as web mining, bioinformatics, speech recognition, there is an abundance of unlabeled data and different feature views. Semi-supervised learning algorithms such as Co-training aim to make use of unlabeled data. Random (feature) subspace (RAS) methods aim to use different feature subspaces to train different classifiers and combine them in an ensemble. In this thesis, we obtain informative and diverse feature subspaces for classifier ensembles by means of randomly drawing relevant feature subspaces. We then use these ensembles for supervised and semi-supervised learning. Our first algorithm produces relevant random subspaces using the mutual information based relevance values. This method is used in Rel-RAS (supervised) and Rel-RASCO (semi-supervised) algorithms. The second algorithm modifies the mRMR (Minimum Redundancy Maximum Relevance) feature selection algorithm to produce random feature subsets that are both relevant and non-redundant. This method is used in mRMR-RAS (supervised) and mRMR-RASCO (semi-supervised) algorithms. We perform experimental analysis of our methods on a number of datasets and compare them to existing methods. We also do theoretical analysis of classifier ensembles produced by our methods using Kohavi Wolpert (KW) variance, information theory based low order diversity (LOD) and information theoretic scores (ITS). We find out that LOD has a similar tendency with KW-variance and ensemble accuracy of the algorithms can be explained using ITS.DoktoraPh
- …