Search CORE

158 research outputs found

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

Author: Dixon Simon
Ewert Sebastian
Stoller Daniel
Publication venue
Publication date: 06/04/2018
Field of study

The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data augmentation is used to combat overfitting. Mixing random tracks, however, can even reduce separation performance as instruments in real music are strongly correlated. The key concept in our approach is that source estimates of an optimal separator should be indistinguishable from real source signals. Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples. This way, we can also use unpaired source and mixture recordings without the drawbacks of creating unrealistic music mixtures. Our framework is widely applicable as it does not assume a specific network architecture or number of sources. To our knowledge, this is the first adoption of adversarial training for music source separation. In a prototype experiment for singing voice separation, separation performance increases with our approach compared to purely supervised training.Comment: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Implementation available at https://github.com/f90/AdversarialAudioSeparatio

arXiv.org e-Print Archive

Crossref

Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages

Author: Durand Simon
Ewert Sebastian
Stoller Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/06/2023
Field of study

Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can limit alignment accuracy. In this paper, we use instead a contrastive learning procedure that derives cross-modal embeddings linking the audio and text domains. This way, we obtain a novel system that is simple to train end-to-end, can make use of weakly annotated training data, jointly learns a powerful text model, and is tailored to alignment. The system is not only the first to yield an average absolute error below 0.2 seconds on the standard Jamendo dataset but it is also robust to other languages, even when trained on English data only. Finally, we release word-level alignments for the JamendoLyrics Multi-Lang dataset.Comment: 5 pages, accepted at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 202

arXiv.org e-Print Archive

Targeted microbubbles: a novel application for the treatment of kidney stones

Author: Bailey Michael
Chi Thomas
Grubbs Robert H.
Kenny Thomas
Laser Daniel
Marx Vanessa
Ramaswamy Krishna
Sorensen Mathew D.
Stoller Marshall L.
Publication venue: Wiley-Blackwell
Publication date: 17/03/2015
Field of study

Kidney stone disease is endemic. Extracorporeal shockwave lithotripsy was the first major technological breakthrough where focused shockwaves were used to fragment stones in the kidney or ureter. The shockwaves induced the formation of cavitation bubbles, whose collapse released energy at the stone, and the energy fragmented the kidney stones into pieces small enough to be passed spontaneously. Can the concept of microbubbles be used without the bulky machine? The logical progression was to manufacture these powerful microbubbles ex vivo and inject these bubbles directly into the collecting system. An external source can be used to induce cavitation once the microbubbles are at their target; the key is targeting these microbubbles to specifically bind to kidney stones. Two important observations have been established: (i) bisphosphonates attach to hydroxyapatite crystals with high affinity; and (ii) there is substantial hydroxyapatite in most kidney stones. The microbubbles can be equipped with bisphosphonate tags to specifically target kidney stones. These bubbles will preferentially bind to the stone and not surrounding tissue, reducing collateral damage. Ultrasound or another suitable form of energy is then applied causing the microbubbles to induce cavitation and fragment the stones. This can be used as an adjunct to ureteroscopy or percutaneous lithotripsy to aid in fragmentation. Randall's plaques, which also contain hydroxyapatite crystals, can also be targeted to pre-emptively destroy these stone precursors. Additionally, targeted microbubbles can aid in kidney stone diagnostics by virtue of being used as an adjunct to traditional imaging methods, especially useful in high-risk patient populations. This novel application of targeted microbubble technology not only represents the next frontier in minimally invasive stone surgery, but a platform technology for other areas of medicine

Crossref

PubMed Central

eScholarship - University of California

Caltech Authors