220 research outputs found
Perspective: Ferromagnetic Liquids
Mechanical jamming of nanoparticles at liquid–liquid interfaces has evolved into a versatile approach to structure liquids with solid-state properties. Ferromagnetic liquids obtain their physical and magnetic properties, including a remanent magnetization that distinguishes them from ferrofluids, from the jamming of magnetic nanoparticles assembled at the interface between two distinct liquids to minimize surface tension. This perspective provides an overview of recent progress and discusses future directions, challenges and potential applications of jamming magnetic nanoparticles with regard to 3D nano-magnetism. We address the formation and characterization of curved magnetic geometries, and spin frustration between dipole-coupled nanostructures, and advance our understanding of particle jamming at liquid–liquid interfaces
Perspective: Ferromagnetic liquids
Mechanical jamming of nanoparticles at liquid-liquid interfaces has evolved into a versatile approach to structure liquids with solid-state properties. Ferromagnetic liquids obtain their physical and magnetic properties, including a remanent magnetization that distinguishes them from ferrofluids, from the jamming of magnetic nanoparticles assembled at the interface between two distinct liquids to minimize surface tension. This perspective provides an overview of recent progress and discusses future directions, challenges and potential applications of jamming magnetic nanoparticles with regard to 3D nano-magnetism. We address the formation and characterization of curved magnetic geometries, and spin frustration between dipole-coupled nanostructures, and advance our understanding of particle jamming at liquid-liquid interfaces
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Deep neural networks have recently achieved breakthroughs in sound generation
with text prompts. Despite their promising performance, current text-to-sound
generation models face issues on small-scale datasets (e.g., overfitting),
significantly limiting their performance. In this paper, we investigate the use
of pre-trained AudioLDM, the state-of-the-art model for text-to-audio
generation, as the backbone for sound generation. Our study demonstrates the
advantages of using pre-trained models for text-to-sound generation, especially
in data-scarcity scenarios. In addition, experiments show that different
training strategies (e.g., training conditions) may affect the performance of
AudioLDM on datasets of different scales. To facilitate future studies, we also
evaluate various text-to-sound generation systems on several frequently used
datasets under the same evaluation protocols, which allow fair comparisons and
benchmarking of these methods on the common ground.Comment: EUSIPCO 202
Origin and tuning of the magnetocaloric effect for the magnetic refrigerant MnFe(P1-xGex)
Neutron diffraction and magnetization measurements of the magneto refrigerant
Mn1+yFe1-yP1-xGex reveal that the ferromagnetic and paramagnetic phases
correspond to two very distinct crystal structures, with the magnetic entropy
change as a function of magnetic field or temperature being directly controlled
by the phase fraction of this first-order transition. By tuning the physical
properties of this system we have achieved a maximum magnetic entropy change
exceeding 74 J/Kg K for both increasing and decreasing field, more than twice
the value of the previous record.Comment: 6 Figures. One tabl
Audio Prompt Tuning for Universal Sound Separation
Universal sound separation (USS) is a task to separate arbitrary sounds from
an audio mixture. Existing USS systems are capable of separating arbitrary
sources, given a few examples of the target sources as queries. However,
separating arbitrary sounds with a single system is challenging, and the
robustness is not always guaranteed. In this work, we propose audio prompt
tuning (APT), a simple yet effective approach to enhance existing USS systems.
Specifically, APT improves the separation performance of specific sources
through training a small number of prompt parameters with limited audio
samples, while maintaining the generalization of the USS model by keeping its
parameters frozen. We evaluate the proposed method on MUSDB18 and ESC-50
datasets. Compared with the baseline model, APT can improve the
signal-to-distortion ratio performance by 0.67 dB and 2.06 dB using the full
training set of two datasets. Moreover, APT with only 5 audio samples even
outperforms the baseline systems utilizing full training data on the ESC-50
dataset, indicating the great potential of few-shot APT
Text-Driven Foley Sound Generation With Latent Diffusion Model
Foley sound generation aims to synthesise the background sound for multimedia
content. Previous models usually employ a large development set with labels as
input (e.g., single numbers or one-hot vector). In this work, we propose a
diffusion model based system for Foley sound generation with text conditions.
To alleviate the data scarcity issue, our model is initially pre-trained with
large-scale datasets and fine-tuned to this task via transfer learning using
the contrastive language-audio pertaining (CLAP) technique. We have observed
that the feature embedding extracted by the text encoder can significantly
affect the performance of the generation model. Hence, we introduce a trainable
layer after the encoder to improve the text embedding produced by the encoder.
In addition, we further refine the generated waveform by generating multiple
candidate audio clips simultaneously and selecting the best one, which is
determined in terms of the similarity score between the embedding of the
candidate clips and the embedding of the target text label. Using the proposed
method, our system ranks among the systems submitted to DCASE
Challenge 2023 Task 7. The results of the ablation studies illustrate that the
proposed techniques significantly improve sound generation performance. The
codes for implementing the proposed system are available online.Comment: Submit to DCASE-workshop 2023. arXiv admin note: text overlap with
arXiv:2305.1590
Adapting Language-Audio Models as Few-Shot Audio Learners
We presented the Treff adapter, a training-efficient adapter for CLAP, to
boost zero-shot classification performance by making use of a small set of
labelled data. Specifically, we designed CALM to retrieve the probability
distribution of text-audio clips over classes using a set of audio-label pairs
and combined it with CLAP's zero-shot classification results. Furthermore, we
designed a training-free version of the Treff adapter by using CALM as a cosine
similarity measure. Experiments showed that the proposed Treff adapter is
comparable and even better than fully-supervised methods and adaptation methods
in low-shot and data-abundant scenarios. While the Treff adapter shows that
combining large-scale pretraining and rapid learning of domain-specific
knowledge is non-trivial for obtaining generic representations for few-shot
learning, it is still limited to audio classification tasks. In the future, we
will explore how to use audio-language models in diverse audio domains
- …