1,006 research outputs found
Deep clustering: Discriminative embeddings for segmentation and separation
We address the problem of acoustic source separation in a deep learning
framework we call "deep clustering." Rather than directly estimating signals or
masking functions, we train a deep network to produce spectrogram embeddings
that are discriminative for partition labels given in training data. Previous
deep network approaches provide great advantages in terms of learning power and
speed, but previously it has been unclear how to use them to separate signals
in a class-independent way. In contrast, spectral clustering approaches are
flexible with respect to the classes and number of items to be segmented, but
it has been unclear how to leverage the learning power and speed of deep
networks. To obtain the best of both worlds, we use an objective function that
to train embeddings that yield a low-rank approximation to an ideal pairwise
affinity matrix, in a class-independent way. This avoids the high cost of
spectral factorization and instead produces compact clusters that are amenable
to simple clustering methods. The segmentations are therefore implicitly
encoded in the embeddings, and can be "decoded" by clustering. Preliminary
experiments show that the proposed method can separate speech: when trained on
spectrogram features containing mixtures of two speakers, and tested on
mixtures of a held-out set of speakers, it can infer masking functions that
improve signal quality by around 6dB. We show that the model can generalize to
three-speaker mixtures despite training only on two-speaker mixtures. The
framework can be used without class labels, and therefore has the potential to
be trained on a diverse set of sound types, and to generalize to novel sources.
We hope that future work will lead to segmentation of arbitrary sounds, with
extensions to microphone array methods as well as image segmentation and other
domains.Comment: Originally submitted on June 5, 201
Deep Clustering and Conventional Networks for Music Separation: Stronger Together
Deep clustering is the first method to handle general audio separation
scenarios with multiple sources of the same type and an arbitrary number of
sources, performing impressively in speaker-independent speech separation
tasks. However, little is known about its effectiveness in other challenging
situations such as music source separation. Contrary to conventional networks
that directly estimate the source signals, deep clustering generates an
embedding for each time-frequency bin, and separates sources by clustering the
bins in the embedding space. We show that deep clustering outperforms
conventional networks on a singing voice separation task, in both matched and
mismatched conditions, even though conventional networks have the advantage of
end-to-end training for best signal approximation, presumably because its more
flexible objective engenders better regularization. Since the strengths of deep
clustering and conventional network architectures appear complementary, we
explore combining them in a single hybrid network trained via an approach akin
to multi-task learning. Remarkably, the combination significantly outperforms
either of its components.Comment: Published in ICASSP 201
A simple and sensitive method for determination of Norfloxacin in pharmaceutical preparations
;Propos-se, por essa abordagem, novo método voltamétrico, com alta sensibilidade e faixa linear de detecção mais ampla, para a determinação de norfloxacino. O sensor voltamétrico utilizado foi fabricado simplismente por cobertura de camada de óxido de grafeno (GO) e filme de Nafion em eletrodo de cabrono vÃtreo. A vantagem do método proposto foi a resposta eletroquÃmica sensÃvel para o norfloxacino, atribuÃda à condutividade elétrica excelente do GO e à função acumulada do Nafion. Sob condições experimentais ótimas, o presente método revelou boa resposta linear para a determinação do norfloxacino na faixa de limite de detecção de 1×10;-8;mol/L-7×10;-6; mol/L. O método proposto foi aplicado com sucesso na determinação de norfloxacino em cápsulas, com resultados satisfatórios.;;In this approach, a new voltammetric method for determination of norfloxacin was proposed with high sensitivity and wider detection linear range. The used voltammetric sensor was fabricated simply by coating a layer of graphene oxide (GO) and Nafion composited film on glassy carbon electrode. The advantage of proposed method was sensitive electrochemical response for norfloxacin, which was attributed to the excellent electrical conductivity of GO and the accumulating function of Nafion under optimum experimental conditions, the present method revealed a good linear response for determination of norfloxacin in the range of 1×10;-8;mol/L-7×10;-6; mol/L with a detection limit of 5×10;-9; mol/L. The proposed method was successfully applied in the determination of norfloxacin in capsules with satisfactory results.
Video Background Music Generation: Dataset, Method and Evaluation
Music is essential when editing videos, but selecting music manually is
difficult and time-consuming. Thus, we seek to automatically generate
background music tracks given video input. This is a challenging task since it
requires plenty of paired videos and music to learn their correspondence.
Unfortunately, there exist no such datasets. To close this gap, we introduce a
dataset, benchmark model, and evaluation metric for video background music
generation. We introduce SymMV, a video and symbolic music dataset, along with
chord, rhythm, melody, and accompaniment annotations. To the best of our
knowledge, it is the first video-music dataset with high-quality symbolic music
and detailed annotations. We also propose a benchmark video background music
generation framework named V-MusProd, which utilizes music priors of chords,
melody, and accompaniment along with video-music relations of semantic, color,
and motion features. To address the lack of objective metrics for video-music
correspondence, we propose a retrieval-based metric VMCP built upon a powerful
video-music representation learning model. Experiments show that with our
dataset, V-MusProd outperforms the state-of-the-art method in both music
quality and correspondence with videos. We believe our dataset, benchmark
model, and evaluation metric will boost the development of video background
music generation
Development of a nonlinear fiber-optic spectrometer for human lung tissue exploration
Several major lung pathologies are characterized by early modifications of the extracellular matrix (ECM) fibrillar collagen and elastin network. We report here the development of a nonlinear fiber-optic spectrometer, compatible with an endoscopic use, primarily intended for the recording of second-harmonic generation (SHG) signal of collagen and two-photon excited fluorescence (2PEF) of both collagen and elastin. Fiber dispersion is accurately compensated by the use of a specific grism-pair stretcher, allowing laser pulse temporal width around 70 fs and excitation wavelength tunability from 790 to 900 nm. This spectrometer was used to investigate the excitation wavelength dependence (from 800 to 870 nm) of SHG and 2PEF spectra originating from ex vivo human lung tissue samples. The results were compared with spectral responses of collagen gel and elastin powder reference samples and also with data obtained using standard nonlinear microspectroscopy. The excitation-wavelength-tunable nonlinear fiber-optic spectrometer presented in this study allows performing nonlinear spectroscopy of human lung tissue ECM through the elastin 2PEF and the collagen SHG signals. This work opens the way to tunable excitation nonlinear endomicroscopy based on both distal scanning of a single optical fiber and proximal scanning of a fiber-optic bundle
A global model for flame pulsation frequency of buoyancy-controlled rectangular gas fuel fire with different boundaries
Pulsation frequency is an important characteristic parameter for buoyancy-controlled fuel diffusion flames. Fire experiments of a rectangular source with different aspect ratios were conducted in an open space and against sidewalls made from a calcium silicate board. Due to the blocking effect to restrict air entrainment to fire plumes, sidewall significantly reduced the flame pulsation frequency. Furthermore, the effect of the fuel exit velocity on the pulsation frequency became intense as the aspect ratio of the rectangle was increased to 7.45. Based on the modified hydraulic diameter for a rectangular fire source with a sidewall and corner, a global model was developed for predicting the flame pulsation frequency of the rectangular fire source with free, sidewall, and corner boundaries. The coefficient of determination of this improved model is 0.9991, and the local errors of this model are less than 15% considering all of the experimental data in the present work and available in the literature. This work provides a method for predicting flame pulsation frequency, accounting for sidewall effect and aspect ratio
Antinociception induced by chronic glucocorticoid treatment is correlated to local modulation of spinal neurotransmitter content
<p>Abstract</p> <p>Background</p> <p>While acute effects of stress on pain are well described, those produced by chronic stress are still a matter of dispute. Previously we demonstrated that chronic unpredictable stress results in antinociception in the tail-flick test, an effect that is mediated by increased levels of corticosteroids. In the present study, we evaluated nociception in rats after chronic treatment with corticosterone (CORT) and dexamethasone (DEX) in order to discriminate the role of each type of corticosteroid receptors in antinociception.</p> <p>Results</p> <p>Both experimental groups exhibited a pronounced antinociceptive effect after three weeks of treatment when compared to controls (CONT); however, at four weeks the pain threshold in CORT-treated animals returned to basal levels whereas in DEX-treated rats antinociception was maintained. In order to assess if these differences are associated with altered expression of neuropeptides involved in nociceptive transmission we evaluated the density of substance P (SP), calcitonin gene-related peptide (CGRP), somatostatin (SS) and <sub>B2</sub>-γ-aminobutiric acid receptors (GABA<sub>B2</sub>) expression in the spinal dorsal horn using light density measurements and stereological techniques. After three weeks of treatment the expression of CGRP in the superficial dorsal horn was significantly decreased in both CORT and DEX groups, while GABA<sub>B2 </sub>was significantly increased; the levels of SP for both experimental groups remained unchanged at this point. At 4 weeks, CGRP and SP are reduced in DEX-treated animals and GABA<sub>B2 </sub>unchanged, but all changes were restored to CONT levels in CORT-treated animals. The expression of SS remained unaltered throughout the experimental period.</p> <p>Conclusion</p> <p>These data indicate that corticosteroids modulate nociception since chronic corticosteroid treatment alters the expression of neuropeptides involved in nociceptive transmission at the spinal cord level. As previously observed in some supraspinal areas, the exclusive GR activation resulted in more profound and sustained behavioural and neurochemical changes, than the one observed with a mixed ligand of corticosteroid receptors. These results might be of relevance for the pharmacological management of certain types of chronic pain, in which corticosteroids are used as adjuvant analgesics.</p
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic
lyrics transcription method achieving state-of-the-art performance on various
lyrics transcription datasets, even in challenging genres such as rock and
metal. Our novel, training-free approach utilizes Whisper, a weakly supervised
robust speech recognition model, and GPT-4, today's most performant chat-based
large language model. In the proposed method, Whisper functions as the "ear" by
transcribing the audio, while GPT-4 serves as the "brain," acting as an
annotator with a strong performance for contextualized output selection and
correction. Our experiments show that LyricWhiz significantly reduces Word
Error Rate compared to existing methods in English and can effectively
transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to
create the first publicly available, large-scale, multilingual lyrics
transcription dataset with a CC-BY-NC-SA copyright license, based on
MTG-Jamendo, and offer a human-annotated subset for noise level estimation and
evaluation. We anticipate that our proposed method and dataset will advance the
development of multilingual lyrics transcription, a challenging and emerging
task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202
Sustaining effective COVID-19 control in Malaysia through large-scale vaccination
Introduction: As of 3rd June 2021, Malaysia is experiencing a resurgence of COVID-19 cases. In response, the federal government has implemented various non-pharmaceutical interventions (NPIs) under a series of Movement Control Orders and, more recently, a vaccination campaign to regain epidemic control. In this study, we assessed the potential for the vaccination campaign to control the epidemic in Malaysia and four high-burden regions of interest, under various public health response scenarios.
Methods: A modified susceptible-exposed-infectious-recovered compartmental model was developed that included two sequential incubation and infectious periods, with stratification by clinical state. The model was further stratified by age and incorporated population mobility to capture NPIs and micro-distancing (behaviour changes not captured through population mobility). Emerging variants of concern (VoC) were included as an additional strain competing with the existing wild-type strain. Several scenarios that included different vaccination strategies (i.e. vaccines that reduce disease severity and/or prevent infection, vaccination coverage) and mobility restrictions were implemented.
Results: The national model and the regional models all fit well to notification data but underestimated ICU occupancy and deaths in recent weeks, which may be attributable to increased severity of VoC or saturation of case detection. However, the true case detection proportion showed wide credible intervals, highlighting incomplete understanding of the true epidemic size. The scenario projections suggested that under current vaccination rates complete relaxation of all NPIs would trigger a major epidemic. The results emphasise the importance of micro-distancing, maintaining mobility restrictions during vaccination roll-out and accelerating the pace of vaccination for future control. Malaysia is particularly susceptible to a major COVID-19 resurgence resulting from its limited population immunity due to the country's historical success in maintaining control throughout much of 2020
- …