158 research outputs found
Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
The state of the art in music source separation employs neural networks
trained in a supervised fashion on multi-track databases to estimate the
sources from a given mixture. With only few datasets available, often extensive
data augmentation is used to combat overfitting. Mixing random tracks, however,
can even reduce separation performance as instruments in real music are
strongly correlated. The key concept in our approach is that source estimates
of an optimal separator should be indistinguishable from real source signals.
Based on this idea, we drive the separator towards outputs deemed as realistic
by discriminator networks that are trained to tell apart real from separator
samples. This way, we can also use unpaired source and mixture recordings
without the drawbacks of creating unrealistic music mixtures. Our framework is
widely applicable as it does not assume a specific network architecture or
number of sources. To our knowledge, this is the first adoption of adversarial
training for music source separation. In a prototype experiment for singing
voice separation, separation performance increases with our approach compared
to purely supervised training.Comment: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for
2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). Implementation available at
https://github.com/f90/AdversarialAudioSeparatio
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages
Lyrics alignment gained considerable attention in recent years.
State-of-the-art systems either re-use established speech recognition toolkits,
or design end-to-end solutions involving a Connectionist Temporal
Classification (CTC) loss. However, both approaches suffer from specific
weaknesses: toolkits are known for their complexity, and CTC systems use a loss
designed for transcription which can limit alignment accuracy. In this paper,
we use instead a contrastive learning procedure that derives cross-modal
embeddings linking the audio and text domains. This way, we obtain a novel
system that is simple to train end-to-end, can make use of weakly annotated
training data, jointly learns a powerful text model, and is tailored to
alignment. The system is not only the first to yield an average absolute error
below 0.2 seconds on the standard Jamendo dataset but it is also robust to
other languages, even when trained on English data only. Finally, we release
word-level alignments for the JamendoLyrics Multi-Lang dataset.Comment: 5 pages, accepted at the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP) 202
Targeted microbubbles: a novel application for the treatment of kidney stones
Kidney stone disease is endemic. Extracorporeal shockwave lithotripsy was the first major technological breakthrough where focused shockwaves were used to fragment stones in the kidney or ureter. The shockwaves induced the formation of cavitation bubbles, whose collapse released energy at the stone, and the energy fragmented the kidney stones into pieces small enough to be passed spontaneously. Can the concept of microbubbles be used without the bulky machine? The logical progression was to manufacture these powerful microbubbles ex vivo and inject these bubbles directly into the collecting system. An external source can be used to induce cavitation once the microbubbles are at their target; the key is targeting these microbubbles to specifically bind to kidney stones. Two important observations have been established: (i) bisphosphonates attach to hydroxyapatite crystals with high affinity; and (ii) there is substantial hydroxyapatite in most kidney stones. The microbubbles can be equipped with bisphosphonate tags to specifically target kidney stones. These bubbles will preferentially bind to the stone and not surrounding tissue, reducing collateral damage. Ultrasound or another suitable form of energy is then applied causing the microbubbles to induce cavitation and fragment the stones. This can be used as an adjunct to ureteroscopy or percutaneous lithotripsy to aid in fragmentation. Randall's plaques, which also contain hydroxyapatite crystals, can also be targeted to pre-emptively destroy these stone precursors. Additionally, targeted microbubbles can aid in kidney stone diagnostics by virtue of being used as an adjunct to traditional imaging methods, especially useful in high-risk patient populations. This novel application of targeted microbubble technology not only represents the next frontier in minimally invasive stone surgery, but a platform technology for other areas of medicine
- …