2,685 research outputs found
Estimating Single-Channel Source Separation Masks: Relevance Vector Machine Classifiers vs. Pitch-Based Masking
Audio sources frequently concentrate much of their energy into a relatively small proportion of the available time-frequency cells in a short-time Fourier transform (STFT). This sparsity makes it possible to separate sources, to some degree, simply by selecting STFT cells dominated by the desired source, setting all others to zero (or to an estimate of the obscured target value), and inverting the STFT to a waveform. The problem of source separation then becomes identifying the cells containing good target information. We treat this as a classification problem, and train a Relevance Vector Machine (a probabilistic relative of the Support Vector Machine) to perform this task. We compare the performance of this classifier both against SVMs (it has similar accuracy but is not as efficient as RVMs), and against a traditional Computational Auditory Scene Analysis (CASA) technique based on a noise-robust pitch tracker, which the RVM outperforms significantly. Differences between the RVM- and pitch-tracker-based mask estimation suggest benefits to be obtained by combining both
A variational EM algorithm for learning eigenvoice parameters in mixed signals
We derive an efficient learning algorithm for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using factor-analyzed hidden Markov models (HMM) where source specific characteristics are captured by an "eigenvoice" speaker subspace model. The proposed algorithm is able to learn adaptation parameters for two speech sources when only a mixture of signals is observed. We evaluate the algorithm on the 2006 speech separation challenge data set and show that it is significantly faster than our earlier system at a small cost in terms of performance
Recommended from our members
Learning, Using, and Adapting Models in Scene Analysis
Discusses models of source behavior as the way to conquer uncertainty in mixtures
Monaural speech separation using source-adapted models
We propose a model-based source separation system for use on single channel speech mixtures where the precise source characteristics are not known a priori. We do this by representing the space of source variation with a parametric signal model based on the eigenvoice technique for rapid speaker adaptation. We present an algorithm to infer the characteristics of the sources present in a mixture, allowing for significantly improved separation performance over that obtained using unadapted source models. The algorithm is evaluated on the task defined in the 2006 Speech Separation Challenge [1] and compared with separation using source-dependent models
Recommended from our members
Combining Localization Cues and Source Model Constraints for Binaural Source Separation
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. The proposed algorithm effectively combines information derived from low level perceptual cues, similar to those used by the human auditory system, with higher level information related to speaker identity. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels in the presence of reverberation. In simulated mixtures of speech from two and three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 1.7 dB over a baseline algorithm which uses only interaural cues. Further improvement is obtained by incorporating eigenvoice speaker adaptation to enable the source model to better match the sources present in the signal. This improves performance over the baseline by 2.7 dB when the speakers used for training and testing are matched. However, the improvement is minimal when the test data is very different from that used in training
Source Separation Based on Binaural Cues and Source Model Constraints
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and known characteristics of the source signals. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels. In simulated reverberant mixtures of three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 2.1 dB over a baseline algorithm using only interaural cues
CNN Architectures for Large-Scale Audio Classification
Convolutional Neural Networks (CNNs) have proven very effective in image
classification and show promise for audio. We use various CNN architectures to
classify the soundtracks of a dataset of 70M training videos (5.24 million
hours) with 30,871 video-level labels. We examine fully connected Deep Neural
Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We
investigate varying the size of both training set and label vocabulary, finding
that analogs of the CNNs used in image classification do well on our audio
classification task, and larger training and label sets help up to a point. A
model using embeddings from these classifiers does much better than raw
features on the Audio Set [5] Acoustic Event Detection (AED) classification
task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of
mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on
changes of latest Audio Set revision. Changed wording to fit 4 page limit
with new addition
Clustering beat-chroma patterns in a large music database
A musical style or genre implies a set of common conventions and patterns combined and deployed in different ways to make individual musical pieces; for instance, most would agree that contemporary pop music is assembled from a relatively small palette of harmonic and melodic patterns. The purpose of this paper is to use a database of tens of thousands of songs in combination with a compact representation of melodic-harmonic content (the beat-synchronous chromagram) and data-mining tools (clustering) to attempt to explicitly catalog this palette — at least within the limitations of the beat-chroma representation. We use online k-means clustering to summarize 3.7 million 4-beat bars in a codebook of a few hundred prototypes. By measuring how accurately such a quantized codebook can reconstruct the original data, we can quantify the degree of diversity (distortion as a function of codebook size) and temporal structure (i.e. the advantage gained by joint quantizing multiple frames) in this music. The most popular codewords themselves reveal the common chords used in the music. Finally, the quantized representation of music can be used for music retrieval tasks such as artist and genre classification, and identifying songs that are similar in terms of their melodic-harmonic content
Impact of EMA regulatory label changes on systemic diclofenac initiation, discontinuation, and switching to other pain medicines in Scotland, England, Denmark, and The Netherlands
Purpose: In June 2013 a European Medicines Agency referral procedure concluded that diclofenac was associated with an elevated risk of acute cardiovascular events and contraindications, warnings, and changes to the product information were implemented across the European Union. This study measured the impact of the regulatory action on the prescribing of systemic diclofenac in Denmark, The Netherlands, England, and Scotland. Methods: Quarterly time series analyses measuring diclofenac prescription initiation, discontinuation and switching to other systemic nonsteroidal anti-inflammatory (NSAIDs), topical NSAIDs, paracetamol, opioids, and other chronic pain medication in those who discontinued diclofenac. Absolute effects were estimated using interrupted time series regression. Results: Overall, diclofenac prescription initiations fell during the observation periods of all countries. Compared with Denmark where there appeared to be amore limited effect, the regulatory action was associated with significant immediate reductions in diclofenac initiation in The Netherlands (−0.42%, 95% CI, −0.66% to −0.18%), England (−0.09%, 95% CI, −0.11% to −0.08%), and Scotland (−0.67%, 95% CI, −0.79% to −0.55%); and falling trends in diclofenac initiation in the Netherlands (−0.03%, 95% CI, −0.06% to −0.01% per quarter) and Scotland (−0.04%, 95% CI, −0.05% to −0.02% per quarter). There was no significant impact on diclofenac discontinuation in any country. The regulatory action was associated with modest differences in switching to other pain medicines following diclofenac discontinuation. Conclusions: The regulatory action was associated with significant reductions in overall diclofenac initiation which varied by country and type of exposure. There was no impact on discontinuation and variable impact on switching
Why have asset price properties changed so little in 200 years
We first review empirical evidence that asset prices have had episodes of
large fluctuations and been inefficient for at least 200 years. We briefly
review recent theoretical results as well as the neurological basis of trend
following and finally argue that these asset price properties can be attributed
to two fundamental mechanisms that have not changed for many centuries: an
innate preference for trend following and the collective tendency to exploit as
much as possible detectable price arbitrage, which leads to destabilizing
feedback loops.Comment: 16 pages, 4 figure
- …