2,685 research outputs found

    Estimating Single-Channel Source Separation Masks: Relevance Vector Machine Classifiers vs. Pitch-Based Masking

    Get PDF
    Audio sources frequently concentrate much of their energy into a relatively small proportion of the available time-frequency cells in a short-time Fourier transform (STFT). This sparsity makes it possible to separate sources, to some degree, simply by selecting STFT cells dominated by the desired source, setting all others to zero (or to an estimate of the obscured target value), and inverting the STFT to a waveform. The problem of source separation then becomes identifying the cells containing good target information. We treat this as a classification problem, and train a Relevance Vector Machine (a probabilistic relative of the Support Vector Machine) to perform this task. We compare the performance of this classifier both against SVMs (it has similar accuracy but is not as efficient as RVMs), and against a traditional Computational Auditory Scene Analysis (CASA) technique based on a noise-robust pitch tracker, which the RVM outperforms significantly. Differences between the RVM- and pitch-tracker-based mask estimation suggest benefits to be obtained by combining both

    A variational EM algorithm for learning eigenvoice parameters in mixed signals

    Get PDF
    We derive an efficient learning algorithm for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using factor-analyzed hidden Markov models (HMM) where source specific characteristics are captured by an "eigenvoice" speaker subspace model. The proposed algorithm is able to learn adaptation parameters for two speech sources when only a mixture of signals is observed. We evaluate the algorithm on the 2006 speech separation challenge data set and show that it is significantly faster than our earlier system at a small cost in terms of performance

    Monaural speech separation using source-adapted models

    Get PDF
    We propose a model-based source separation system for use on single channel speech mixtures where the precise source characteristics are not known a priori. We do this by representing the space of source variation with a parametric signal model based on the eigenvoice technique for rapid speaker adaptation. We present an algorithm to infer the characteristics of the sources present in a mixture, allowing for significantly improved separation performance over that obtained using unadapted source models. The algorithm is evaluated on the task defined in the 2006 Speech Separation Challenge [1] and compared with separation using source-dependent models

    Source Separation Based on Binaural Cues and Source Model Constraints

    Get PDF
    We describe a system for separating multiple sources from a two-channel recording based on interaural cues and known characteristics of the source signals. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels. In simulated reverberant mixtures of three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 2.1 dB over a baseline algorithm using only interaural cues

    CNN Architectures for Large-Scale Audio Classification

    Full text link
    Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new addition

    Clustering beat-chroma patterns in a large music database

    Get PDF
    A musical style or genre implies a set of common conventions and patterns combined and deployed in different ways to make individual musical pieces; for instance, most would agree that contemporary pop music is assembled from a relatively small palette of harmonic and melodic patterns. The purpose of this paper is to use a database of tens of thousands of songs in combination with a compact representation of melodic-harmonic content (the beat-synchronous chromagram) and data-mining tools (clustering) to attempt to explicitly catalog this palette — at least within the limitations of the beat-chroma representation. We use online k-means clustering to summarize 3.7 million 4-beat bars in a codebook of a few hundred prototypes. By measuring how accurately such a quantized codebook can reconstruct the original data, we can quantify the degree of diversity (distortion as a function of codebook size) and temporal structure (i.e. the advantage gained by joint quantizing multiple frames) in this music. The most popular codewords themselves reveal the common chords used in the music. Finally, the quantized representation of music can be used for music retrieval tasks such as artist and genre classification, and identifying songs that are similar in terms of their melodic-harmonic content

    Impact of EMA regulatory label changes on systemic diclofenac initiation, discontinuation, and switching to other pain medicines in Scotland, England, Denmark, and The Netherlands

    Get PDF
    Purpose: In June 2013 a European Medicines Agency referral procedure concluded that diclofenac was associated with an elevated risk of acute cardiovascular events and contraindications, warnings, and changes to the product information were implemented across the European Union. This study measured the impact of the regulatory action on the prescribing of systemic diclofenac in Denmark, The Netherlands, England, and Scotland. Methods: Quarterly time series analyses measuring diclofenac prescription initiation, discontinuation and switching to other systemic nonsteroidal anti-inflammatory (NSAIDs), topical NSAIDs, paracetamol, opioids, and other chronic pain medication in those who discontinued diclofenac. Absolute effects were estimated using interrupted time series regression. Results: Overall, diclofenac prescription initiations fell during the observation periods of all countries. Compared with Denmark where there appeared to be amore limited effect, the regulatory action was associated with significant immediate reductions in diclofenac initiation in The Netherlands (−0.42%, 95% CI, −0.66% to −0.18%), England (−0.09%, 95% CI, −0.11% to −0.08%), and Scotland (−0.67%, 95% CI, −0.79% to −0.55%); and falling trends in diclofenac initiation in the Netherlands (−0.03%, 95% CI, −0.06% to −0.01% per quarter) and Scotland (−0.04%, 95% CI, −0.05% to −0.02% per quarter). There was no significant impact on diclofenac discontinuation in any country. The regulatory action was associated with modest differences in switching to other pain medicines following diclofenac discontinuation. Conclusions: The regulatory action was associated with significant reductions in overall diclofenac initiation which varied by country and type of exposure. There was no impact on discontinuation and variable impact on switching

    Why have asset price properties changed so little in 200 years

    Full text link
    We first review empirical evidence that asset prices have had episodes of large fluctuations and been inefficient for at least 200 years. We briefly review recent theoretical results as well as the neurological basis of trend following and finally argue that these asset price properties can be attributed to two fundamental mechanisms that have not changed for many centuries: an innate preference for trend following and the collective tendency to exploit as much as possible detectable price arbitrage, which leads to destabilizing feedback loops.Comment: 16 pages, 4 figure
    • …
    corecore