296 research outputs found

    Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

    Full text link
    We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Between the two spaces, information is cast towards each other via a novel cross- and dual-attention mechanism, mimicking the bottom-up and top-down processes of a human's cocktail party effect. It turns out that substantially discriminative and generalizable speaker representations can be learnt in severely interfered conditions via our self-supervised training. The experimental results verify this seeming paradox. The learnt speaker embedding has superior discriminative power than a standard speaker verification method; meanwhile, Tune-In achieves remarkably better speech separation performances in terms of SI-SNRi and SDRi consistently in all test modes, and especially at lower memory and computational consumption, than state-of-the-art benchmark systems.Comment: Accepted in AAAI 202

    Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

    Get PDF
    One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost. Forward along each block inside Sandglasset, the temporal granularity of the features gradually becomes coarser until reaching half of the network blocks, and then successively turns finer towards the raw signal level. We also unfold that residual connections between features with the same granularity are critical for preserving information after passing through the bottleneck layer. Experiments show our Sandglasset with only 2.3M parameters has achieved the best results on two benchmark SS datasets -- WSJ0-2mix and WSJ0-3mix, where the SI-SNRi scores have been improved by absolute 0.8 dB and 2.4 dB, respectively, comparing to the prior SOTA results.Comment: Accepted in ICASSP 202

    Contrastive Separative Coding for Self-supervised Representation Learning

    Get PDF
    To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating the target signal from contrastive interfering signals. First, a multi-task separative encoder is built to extract shared separable and discriminative embedding; secondly, we propose a powerful cross-attention mechanism performed over speaker representations across various interfering conditions, allowing the model to focus on and globally aggregate the most critical information to answer the "query" (current bottom-up embedding) while paying less attention to interfering, noisy, or irrelevant parts; lastly, we form a new probabilistic contrastive loss which estimates and maximizes the mutual information between the representations and the global speaker vector. While most prior unsupervised methods have focused on predicting the future, neighboring, or missing samples, we take a different perspective of predicting the interfered samples. Moreover, our contrastive separative loss is free from negative sampling. The experiment demonstrates that our approach can learn useful representations achieving a strong speaker verification performance in adverse conditions.Comment: Accepted in ICASSP 202

    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

    Get PDF
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.Comment: Accepted by IJCAI 202

    Factors that affect proliferation of Salmonella in tomatoes post-harvest: the roles of seasonal effects, irrigation regime, crop and pathogen genotype

    Get PDF
    MAIN OBJECTIVES: Fresh fruits and vegetables become increasingly recognized as vehicles of human salmonellosis. Physiological, ecological, and environmental factors are all thought to contribute to the ability of Salmonella to colonize fruits and vegetables pre- and post-harvest. The goal of this study was to test how irrigation levels, fruit water congestion, crop and pathogen genotypes affect the ability of Salmonella to multiply in tomatoes post-harvest. EXPERIMENTAL DESIGN: Fruits from three tomato varieties, grown over three production seasons in two Florida locations, were infected with seven strains of Salmonella and their ability to multiply post-harvest in field-grown tomatoes was tested. The field experiments were set up as a two-factor factorial split plot experiment, with the whole-plot treatments arranged in a randomized complete-block design. The irrigation treatment (at three levels) was the whole-plot factor, and the split-plot factor was tomato variety, with three levels. The significance of the main, two-way, and three-way interaction effects was tested using the (type III) F-tests for fixed effects. Mean separation for each significant fixed effect in the model was performed using Tukey's multiple comparison testing procedure. MOST IMPORTANT DISCOVERIES AND SIGNIFICANCE: The irrigation regime per se did not affect susceptibility of the crop to post-harvest proliferation of Salmonella. However, Salmonella grew significantly better in water-congested tissues of green tomatoes. Tomato maturity and genotype, Salmonella genotype, and inter-seasonal differences were the strongest factors affecting proliferation. Red ripe tomatoes were significantly and consistently more conducive to proliferation of Salmonella. Tomatoes harvested in the driest, sunniest season were the most conducive to post-harvest proliferation of the pathogen. Statistically significant interactions between production conditions affected post-harvest susceptibility of the crop to the pathogen. UV irradiation of tomatoes post-harvest promoted Salmonella growth

    Specific Responses of Salmonella enterica to Tomato Varieties and Fruit Ripeness Identified by In Vivo Expression Technology

    Get PDF
    Recent outbreaks of vegetable-associated gastroenteritis suggest that enteric pathogens colonize, multiply and persist in plants for extended periods of time, eventually infecting people. Genetic and physiological pathways, by which enterics colonize plants, are still poorly understood.To better understand interactions between Salmonella enterica sv. Typhimurium and tomatoes, a gfp-tagged Salmonella promoter library was screened inside red ripe fruits. Fifty-one unique constructs that were potentially differentially regulated in tomato relative to in vitro growth were identified. The expression of a subset of these promoters was tested in planta using recombinase-based in vivo expression technology (RIVET) and fitness of the corresponding mutants was tested. Gene expression in Salmonella was affected by fruit maturity and tomato cultivar. A putative fadH promoter was upregulated most strongly in immature tomatoes. Expression of the fadH construct depended on the presence of linoleic acid, which is consistent with the reduced accumulation of this compound in mature tomato fruits. The cysB construct was activated in the fruit of cv. Hawaii 7997 (resistant to a race of Ralstonia solanacearum) more strongly than in the universally susceptible tomato cv. Bonny Best. Known Salmonella motility and animal virulence genes (hilA, flhDC, fliF and those encoded on the pSLT virulence plasmid) did not contribute significantly to fitness of the bacteria inside tomatoes, even though deletions of sirA and motA modestly increased fitness of Salmonella inside tomatoes.This study reveals the genetic basis of the interactions of Salmonella with plant hosts. Salmonella relies on a distinct set of metabolic and regulatory genes, which are differentially regulated in planta in response to host genotype and fruit maturity. This enteric pathogen colonizes tissues of tomatoes differently than plant pathogens, and relies little on its animal virulence genes for persistence within the fruit

    Search for dark matter produced in association with bottom or top quarks in √s = 13 TeV pp collisions with the ATLAS detector

    Get PDF
    A search for weakly interacting massive particle dark matter produced in association with bottom or top quarks is presented. Final states containing third-generation quarks and miss- ing transverse momentum are considered. The analysis uses 36.1 fb−1 of proton–proton collision data recorded by the ATLAS experiment at √s = 13 TeV in 2015 and 2016. No significant excess of events above the estimated backgrounds is observed. The results are in- terpreted in the framework of simplified models of spin-0 dark-matter mediators. For colour- neutral spin-0 mediators produced in association with top quarks and decaying into a pair of dark-matter particles, mediator masses below 50 GeV are excluded assuming a dark-matter candidate mass of 1 GeV and unitary couplings. For scalar and pseudoscalar mediators produced in association with bottom quarks, the search sets limits on the production cross- section of 300 times the predicted rate for mediators with masses between 10 and 50 GeV and assuming a dark-matter mass of 1 GeV and unitary coupling. Constraints on colour- charged scalar simplified models are also presented. Assuming a dark-matter particle mass of 35 GeV, mediator particles with mass below 1.1 TeV are excluded for couplings yielding a dark-matter relic density consistent with measurements

    Combined searches for the production of supersymmetric top quark partners in proton-proton collisions at root s=13 TeV

    Get PDF
    A combination of searches for top squark pair production using proton-proton collision data at a center-of-mass energy of 13 TeV at the CERN LHC, corresponding to an integrated luminosity of 137 fb(-1) collected by the CMS experiment, is presented. Signatures with at least 2 jets and large missing transverse momentum are categorized into events with 0, 1, or 2 leptons. New results for regions of parameter space where the kinematical properties of top squark pair production and top quark pair production are very similar are presented. Depending on themodel, the combined result excludes a top squarkmass up to 1325 GeV for amassless neutralino, and a neutralinomass up to 700 GeV for a top squarkmass of 1150 GeV. Top squarks with masses from 145 to 295 GeV, for neutralino masses from 0 to 100 GeV, with a mass difference between the top squark and the neutralino in a window of 30 GeV around the mass of the top quark, are excluded for the first time with CMS data. The results of theses searches are also interpreted in an alternative signal model of dark matter production via a spin-0 mediator in association with a top quark pair. Upper limits are set on the cross section for mediator particle masses of up to 420 GeV
    corecore