1,006 research outputs found

    Deep clustering: Discriminative embeddings for segmentation and separation

    Full text link
    We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are discriminative for partition labels given in training data. Previous deep network approaches provide great advantages in terms of learning power and speed, but previously it has been unclear how to use them to separate signals in a class-independent way. In contrast, spectral clustering approaches are flexible with respect to the classes and number of items to be segmented, but it has been unclear how to leverage the learning power and speed of deep networks. To obtain the best of both worlds, we use an objective function that to train embeddings that yield a low-rank approximation to an ideal pairwise affinity matrix, in a class-independent way. This avoids the high cost of spectral factorization and instead produces compact clusters that are amenable to simple clustering methods. The segmentations are therefore implicitly encoded in the embeddings, and can be "decoded" by clustering. Preliminary experiments show that the proposed method can separate speech: when trained on spectrogram features containing mixtures of two speakers, and tested on mixtures of a held-out set of speakers, it can infer masking functions that improve signal quality by around 6dB. We show that the model can generalize to three-speaker mixtures despite training only on two-speaker mixtures. The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources. We hope that future work will lead to segmentation of arbitrary sounds, with extensions to microphone array methods as well as image segmentation and other domains.Comment: Originally submitted on June 5, 201

    Deep Clustering and Conventional Networks for Music Separation: Stronger Together

    Full text link
    Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.Comment: Published in ICASSP 201

    A simple and sensitive method for determination of Norfloxacin in pharmaceutical preparations

    Get PDF
    ;Propos-se, por essa abordagem, novo método voltamétrico, com alta sensibilidade e faixa linear de detecção mais ampla, para a determinação de norfloxacino. O sensor voltamétrico utilizado foi fabricado simplismente por cobertura de camada de óxido de grafeno (GO) e filme de Nafion em eletrodo de cabrono vítreo. A vantagem do método proposto foi a resposta eletroquímica sensível para o norfloxacino, atribuída à condutividade elétrica excelente do GO e à função acumulada do Nafion. Sob condições experimentais ótimas, o presente método revelou boa resposta linear para a determinação do norfloxacino na faixa de limite de detecção de 1×10;-8;mol/L-7×10;-6; mol/L. O método proposto foi aplicado com sucesso na determinação de norfloxacino em cápsulas, com resultados satisfatórios.;;In this approach, a new voltammetric method for determination of norfloxacin was proposed with high sensitivity and wider detection linear range. The used voltammetric sensor was fabricated simply by coating a layer of graphene oxide (GO) and Nafion composited film on glassy carbon electrode. The advantage of proposed method was sensitive electrochemical response for norfloxacin, which was attributed to the excellent electrical conductivity of GO and the accumulating function of Nafion under optimum experimental conditions, the present method revealed a good linear response for determination of norfloxacin in the range of 1×10;-8;mol/L-7×10;-6; mol/L with a detection limit of 5×10;-9; mol/L. The proposed method was successfully applied in the determination of norfloxacin in capsules with satisfactory results.

    Video Background Music Generation: Dataset, Method and Evaluation

    Full text link
    Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires plenty of paired videos and music to learn their correspondence. Unfortunately, there exist no such datasets. To close this gap, we introduce a dataset, benchmark model, and evaluation metric for video background music generation. We introduce SymMV, a video and symbolic music dataset, along with chord, rhythm, melody, and accompaniment annotations. To the best of our knowledge, it is the first video-music dataset with high-quality symbolic music and detailed annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we propose a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation

    Development of a nonlinear fiber-optic spectrometer for human lung tissue exploration

    Get PDF
    Several major lung pathologies are characterized by early modifications of the extracellular matrix (ECM) fibrillar collagen and elastin network. We report here the development of a nonlinear fiber-optic spectrometer, compatible with an endoscopic use, primarily intended for the recording of second-harmonic generation (SHG) signal of collagen and two-photon excited fluorescence (2PEF) of both collagen and elastin. Fiber dispersion is accurately compensated by the use of a specific grism-pair stretcher, allowing laser pulse temporal width around 70 fs and excitation wavelength tunability from 790 to 900 nm. This spectrometer was used to investigate the excitation wavelength dependence (from 800 to 870 nm) of SHG and 2PEF spectra originating from ex vivo human lung tissue samples. The results were compared with spectral responses of collagen gel and elastin powder reference samples and also with data obtained using standard nonlinear microspectroscopy. The excitation-wavelength-tunable nonlinear fiber-optic spectrometer presented in this study allows performing nonlinear spectroscopy of human lung tissue ECM through the elastin 2PEF and the collagen SHG signals. This work opens the way to tunable excitation nonlinear endomicroscopy based on both distal scanning of a single optical fiber and proximal scanning of a fiber-optic bundle

    A global model for flame pulsation frequency of buoyancy-controlled rectangular gas fuel fire with different boundaries

    Get PDF
    Pulsation frequency is an important characteristic parameter for buoyancy-controlled fuel diffusion flames. Fire experiments of a rectangular source with different aspect ratios were conducted in an open space and against sidewalls made from a calcium silicate board. Due to the blocking effect to restrict air entrainment to fire plumes, sidewall significantly reduced the flame pulsation frequency. Furthermore, the effect of the fuel exit velocity on the pulsation frequency became intense as the aspect ratio of the rectangle was increased to 7.45. Based on the modified hydraulic diameter for a rectangular fire source with a sidewall and corner, a global model was developed for predicting the flame pulsation frequency of the rectangular fire source with free, sidewall, and corner boundaries. The coefficient of determination of this improved model is 0.9991, and the local errors of this model are less than 15% considering all of the experimental data in the present work and available in the literature. This work provides a method for predicting flame pulsation frequency, accounting for sidewall effect and aspect ratio

    Antinociception induced by chronic glucocorticoid treatment is correlated to local modulation of spinal neurotransmitter content

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While acute effects of stress on pain are well described, those produced by chronic stress are still a matter of dispute. Previously we demonstrated that chronic unpredictable stress results in antinociception in the tail-flick test, an effect that is mediated by increased levels of corticosteroids. In the present study, we evaluated nociception in rats after chronic treatment with corticosterone (CORT) and dexamethasone (DEX) in order to discriminate the role of each type of corticosteroid receptors in antinociception.</p> <p>Results</p> <p>Both experimental groups exhibited a pronounced antinociceptive effect after three weeks of treatment when compared to controls (CONT); however, at four weeks the pain threshold in CORT-treated animals returned to basal levels whereas in DEX-treated rats antinociception was maintained. In order to assess if these differences are associated with altered expression of neuropeptides involved in nociceptive transmission we evaluated the density of substance P (SP), calcitonin gene-related peptide (CGRP), somatostatin (SS) and <sub>B2</sub>-γ-aminobutiric acid receptors (GABA<sub>B2</sub>) expression in the spinal dorsal horn using light density measurements and stereological techniques. After three weeks of treatment the expression of CGRP in the superficial dorsal horn was significantly decreased in both CORT and DEX groups, while GABA<sub>B2 </sub>was significantly increased; the levels of SP for both experimental groups remained unchanged at this point. At 4 weeks, CGRP and SP are reduced in DEX-treated animals and GABA<sub>B2 </sub>unchanged, but all changes were restored to CONT levels in CORT-treated animals. The expression of SS remained unaltered throughout the experimental period.</p> <p>Conclusion</p> <p>These data indicate that corticosteroids modulate nociception since chronic corticosteroid treatment alters the expression of neuropeptides involved in nociceptive transmission at the spinal cord level. As previously observed in some supraspinal areas, the exclusive GR activation resulted in more profound and sustained behavioural and neurochemical changes, than the one observed with a mixed ligand of corticosteroid receptors. These results might be of relevance for the pharmacological management of certain types of chronic pain, in which corticosteroids are used as adjuvant analgesics.</p

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Full text link
    We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

    Sustaining effective COVID-19 control in Malaysia through large-scale vaccination

    Get PDF
    Introduction: As of 3rd June 2021, Malaysia is experiencing a resurgence of COVID-19 cases. In response, the federal government has implemented various non-pharmaceutical interventions (NPIs) under a series of Movement Control Orders and, more recently, a vaccination campaign to regain epidemic control. In this study, we assessed the potential for the vaccination campaign to control the epidemic in Malaysia and four high-burden regions of interest, under various public health response scenarios. Methods: A modified susceptible-exposed-infectious-recovered compartmental model was developed that included two sequential incubation and infectious periods, with stratification by clinical state. The model was further stratified by age and incorporated population mobility to capture NPIs and micro-distancing (behaviour changes not captured through population mobility). Emerging variants of concern (VoC) were included as an additional strain competing with the existing wild-type strain. Several scenarios that included different vaccination strategies (i.e. vaccines that reduce disease severity and/or prevent infection, vaccination coverage) and mobility restrictions were implemented. Results: The national model and the regional models all fit well to notification data but underestimated ICU occupancy and deaths in recent weeks, which may be attributable to increased severity of VoC or saturation of case detection. However, the true case detection proportion showed wide credible intervals, highlighting incomplete understanding of the true epidemic size. The scenario projections suggested that under current vaccination rates complete relaxation of all NPIs would trigger a major epidemic. The results emphasise the importance of micro-distancing, maintaining mobility restrictions during vaccination roll-out and accelerating the pace of vaccination for future control. Malaysia is particularly susceptible to a major COVID-19 resurgence resulting from its limited population immunity due to the country's historical success in maintaining control throughout much of 2020
    • …
    corecore