11,513 research outputs found
Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders
Supervised multi-channel audio source separation requires extracting useful
spectral, temporal, and spatial features from the mixed signals. The success of
many existing systems is therefore largely dependent on the choice of features
used for training. In this work, we introduce a novel multi-channel,
multi-resolution convolutional auto-encoder neural network that works on raw
time-domain signals to determine appropriate multi-resolution features for
separating the singing-voice from stereo music. Our experimental results show
that the proposed method can achieve multi-channel audio source separation
without the need for hand-crafted features or any pre- or post-processing
Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation
Singing voice detection is the task to identify the frames which contain the
singer vocal or not. It has been one of the main components in music
information retrieval (MIR), which can be applicable to melody extraction,
artist recognition, and music discovery in popular music. Although there are
several methods which have been proposed, a more robust and more complete
system is desired to improve the detection performance. In this paper, our
motivation is to provide an extensive comparison in different stages of singing
voice detection. Based on the analysis a novel method was proposed to build a
more efficiently singing voice detection system. In the proposed system, there
are main three parts. The first is a pre-process of singing voice separation to
extract the vocal without the music. The improvements of several singing voice
separation methods were compared to decide the best one which is integrated to
singing voice detection system. And the second is a deep neural network based
classifier to identify the given frames. Different deep models for
classification were also compared. The last one is a post-process to filter out
the anomaly frame on the prediction result of the classifier. The median filter
and Hidden Markov Model (HMM) based filter as the post process were compared.
Through the step by step module extension, the different methods were compared
and analyzed. Finally, classification performance on two public datasets
indicates that the proposed approach which based on the Long-term Recurrent
Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
Text-independent speaker recognition
This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
Pitch-Informed Solo and Accompaniment Separation
Das Thema dieser Dissertation ist die Entwicklung eines Systems zur
Tonhöhen-informierten Quellentrennung von Musiksignalen in Soloinstrument
und Begleitung. Dieses ist geeignet, die dominanten Instrumente aus einem
Musikstück zu isolieren, unabhängig von der Art des Instruments, der
Begleitung und Stilrichtung. Dabei werden nur einstimmige
Melodieinstrumente in Betracht gezogen. Die Musikaufnahmen liegen monaural
vor, es kann also keine zusätzliche Information aus der Verteilung der
Instrumente im Stereo-Panorama gewonnen werden.
Die entwickelte Methode nutzt Tonhöhen-Information als Basis für eine
sinusoidale Modellierung der spektralen Eigenschaften des Soloinstruments
aus dem Musikmischsignal. Anstatt die spektralen Informationen pro Frame zu
bestimmen, werden in der vorgeschlagenen Methode Tonobjekte fĂĽr die
Separation genutzt. Tonobjekt-basierte Verarbeitung ermöglicht es,
zusätzlich die Notenanfänge zu verfeinern, transiente Artefakte zu
reduzieren, gemeinsame Amplitudenmodulation (Common Amplitude Modulation
CAM) einzubeziehen und besser nichtharmonische Elemente der Töne
abzuschätzen. Der vorgestellte Algorithmus zur Quellentrennung von
Soloinstrument und Begleitung ermöglicht eine Echtzeitverarbeitung und ist
somit relevant fĂĽr den praktischen Einsatz.
Ein Experiment zur besseren Modellierung der Zusammenhänge zwischen
Magnitude, Phase und Feinfrequenz von isolierten Instrumententönen wurde
durchgeführt. Als Ergebnis konnte die Kontinuität der zeitlichen
Einhüllenden, die Inharmonizität bestimmter Musikinstrumente und die
Auswertung des Phasenfortschritts fĂĽr die vorgestellte Methode ausgenutzt
werden. Zusätzlich wurde ein Algorithmus für die Quellentrennung in
perkussive und harmonische Signalanteile auf Basis des Phasenfortschritts
entwickelt. Dieser erreicht ein verbesserte perzeptuelle Qualität der
harmonischen und perkussiven Signale gegenĂĽber vergleichbaren Methoden nach
dem Stand der Technik.
Die vorgestellte Methode zur Klangquellentrennung in Soloinstrument und
Begleitung wurde zu den Evaluationskampagnen SiSEC 2011 und SiSEC 2013
eingereicht. Dort konnten vergleichbare Ergebnisse im Hinblick auf
perzeptuelle Bewertungsmaße erzielt werden. Die Qualität eines
Referenzalgorithmus im Hinblick auf den in dieser Dissertation
beschriebenen Instrumentaldatensatz ĂĽbertroffen werden.
Als ein Anwendungsszenario fĂĽr die Klangquellentrennung in Solo und
Begleitung wurde ein Hörtest durchgeführt, der die Qualitätsanforderungen
an Quellentrennung im Kontext von Musiklernsoftware bewerten sollte. Die
Ergebnisse dieses Hörtests zeigen, dass die Solo- und Begleitspur gemäß
unterschiedlicher Qualitätskriterien getrennt werden sollten. Die
Musiklernsoftware Songs2See integriert die vorgestellte
Klangquellentrennung bereits in einer kommerziell erhältlichen Anwendung.This thesis addresses the development of a system for pitch-informed solo
and accompaniment separation capable of separating main instruments from
music accompaniment regardless of the musical genre of the track, or type
of music accompaniment. For the solo instrument, only pitched monophonic
instruments were considered in a single-channel scenario where no panning
or spatial location information is available.
In the proposed method, pitch information is used as an initial stage of a
sinusoidal modeling approach that attempts to estimate the spectral
information of the solo instrument from a given audio mixture. Instead of
estimating the solo instrument on a frame by frame basis, the proposed
method gathers information of tone objects to perform separation.
Tone-based processing allowed the inclusion of novel processing stages for
attack refinement, transient interference reduction, common amplitude
modulation (CAM) of tone objects, and for better estimation of non-harmonic
elements that can occur in musical instrument tones. The proposed solo and
accompaniment algorithm is an efficient method suitable for real-world
applications.
A study was conducted to better model magnitude, frequency, and phase of
isolated musical instrument tones. As a result of this study, temporal
envelope smoothness, inharmonicty of musical instruments, and phase
expectation were exploited in the proposed separation method. Additionally,
an algorithm for harmonic/percussive separation based on phase expectation
was proposed. The algorithm shows improved perceptual quality with respect
to state-of-the-art methods for harmonic/percussive separation.
The proposed solo and accompaniment method obtained perceptual quality
scores comparable to other state-of-the-art algorithms under the SiSEC 2011
and SiSEC 2013 campaigns, and outperformed the comparison algorithm on the
instrumental dataset described in this thesis.As a use-case of solo and
accompaniment separation, a listening test procedure was conducted to
assess separation quality requirements in the context of music education.
Results from the listening test showed that solo and accompaniment tracks
should be optimized differently to suit quality requirements of music
education. The Songs2See application was presented as commercial music
learning software which includes the proposed solo and accompaniment
separation method
ELAIA 2018
Over the years, the Program has continued to grow and flourish, and the depth of its research continues to increase. This inaugural journal represents the fruits of that development, containing capstone research projects from the 2018 Honors Program senior class and their faculty mentors. The Table of Contents is diverse, and in that way it is a crystal clear reflection of our program’s community of scholars.
I, along with the members of the Honors Council, am gratified by the work of each student and faculty mentor printed within these pages. Congratulations, everyone!
- Stephen Lowe, Honors Program Directo
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
Monaural Singing Voice Separation (MSVS) is a challenging task and has been
studied for decades. Deep neural networks (DNNs) are the current
state-of-the-art methods for MSVS. However, the existing DNNs are often
designed manually, which is time-consuming and error-prone. In addition, the
network architectures are usually pre-defined, and not adapted to the training
data. To address these issues, we introduce a Neural Architecture Search (NAS)
method to the structure design of DNNs for MSVS. Specifically, we propose a new
multi-resolution Convolutional Neural Network (CNN) framework for MSVS namely
Multi-Resolution Pooling CNN (MRP-CNN), which uses various-size pooling
operators to extract multi-resolution features. Based on the NAS, we then
develop an evolving framework namely Evolving MRP-CNN (E-MRP-CNN), by
automatically searching the effective MRP-CNN structures using genetic
algorithms, optimized in terms of a single-objective considering only
separation performance, or multi-objective considering both the separation
performance and the model complexity. The multi-objective E-MRP-CNN gives a set
of Pareto-optimal solutions, each providing a trade-off between separation
performance and model complexity. Quantitative and qualitative evaluations on
the MIR-1K and DSD100 datasets are used to demonstrate the advantages of the
proposed framework over several recent baselines
- …