1,203 research outputs found

    Identification of T-cell receptor a-chain genes in the chicken

    Get PDF
    T-cell receptor (TCR) -chain (TCR) and ß-chain (TCRß) genes are well characterized in mammals, while only TCRß genes have been identified in other vertebrates. To identify avian TCR genes, we used monoclonal anti-CD3 antibodies to isolate chicken TCR for peptide sequence analysis. Degenerate oligonucleotide probes were then used to isolate a candidate TCR cDNA clone that hybridized with a 1.7-kb mRNA species present only in ß T cells and in tissues populated by these cells. Southern blot analysis revealed gene rearrangement in thymocytes and ß T-cell lines. The TCR cDNA candidate encoded an openreading frame of 275 amino acids, the predicted variable (V)-, joining (J)-, and constant (C)-region amino acid sequences of which shared 40%, 60%, and 25% homology with corresponding mammalian sequences. A single C gene and 25 V genes were identified by using region-specific probes. The V cDNA probe isolated from a Vß1+ cell line reacted with transcripts from one of five Vß2+ cell lines, suggesting shared use of V genes by Vß1+ and Vß2+ T cells and the existence of other V gene families. A genomic V sequence was flanked by classical recombination signal sequences but, unlike previously defined V genes, the leader and V region were encoded by a single exon. The data indicate evolutionary conservation of the basic TCR gene structure in birds and mammal

    Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

    Full text link
    We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Between the two spaces, information is cast towards each other via a novel cross- and dual-attention mechanism, mimicking the bottom-up and top-down processes of a human's cocktail party effect. It turns out that substantially discriminative and generalizable speaker representations can be learnt in severely interfered conditions via our self-supervised training. The experimental results verify this seeming paradox. The learnt speaker embedding has superior discriminative power than a standard speaker verification method; meanwhile, Tune-In achieves remarkably better speech separation performances in terms of SI-SNRi and SDRi consistently in all test modes, and especially at lower memory and computational consumption, than state-of-the-art benchmark systems.Comment: Accepted in AAAI 202

    Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

    Get PDF
    One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost. Forward along each block inside Sandglasset, the temporal granularity of the features gradually becomes coarser until reaching half of the network blocks, and then successively turns finer towards the raw signal level. We also unfold that residual connections between features with the same granularity are critical for preserving information after passing through the bottleneck layer. Experiments show our Sandglasset with only 2.3M parameters has achieved the best results on two benchmark SS datasets -- WSJ0-2mix and WSJ0-3mix, where the SI-SNRi scores have been improved by absolute 0.8 dB and 2.4 dB, respectively, comparing to the prior SOTA results.Comment: Accepted in ICASSP 202

    Contrastive Separative Coding for Self-supervised Representation Learning

    Get PDF
    To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating the target signal from contrastive interfering signals. First, a multi-task separative encoder is built to extract shared separable and discriminative embedding; secondly, we propose a powerful cross-attention mechanism performed over speaker representations across various interfering conditions, allowing the model to focus on and globally aggregate the most critical information to answer the "query" (current bottom-up embedding) while paying less attention to interfering, noisy, or irrelevant parts; lastly, we form a new probabilistic contrastive loss which estimates and maximizes the mutual information between the representations and the global speaker vector. While most prior unsupervised methods have focused on predicting the future, neighboring, or missing samples, we take a different perspective of predicting the interfered samples. Moreover, our contrastive separative loss is free from negative sampling. The experiment demonstrates that our approach can learn useful representations achieving a strong speaker verification performance in adverse conditions.Comment: Accepted in ICASSP 202

    Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)

    Get PDF
    Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis

    Drawing-Based Automatic Dementia Screening Using Gaussian Process Markov Chains

    Get PDF
    Screening tests play an important role for early detection of dementia. Among those widely used screening tests, drawing tests have gained much attention in clinical psychology. Traditional evaluation of drawing tests totally relies on the appearance of drawn picture, but does not consider any time-dependent behaviour. We demonstrated that the processing speed and direction can reflect the decline of cognitive function, and thus may be useful for disease screening. We proposed a model of Gaussian process Markov chains (GPMC) to study the complex associations within the drawing data. Specifically, we modeled the process of drawing in a state-space form, where a drawing state is composed of drawing direction and velocity with consideration of the processing time. For temporal modeling, our scope focused more on discrete-time Markov chains on continuous state space. Because of the short processing time of picture drawing, we applied higher-order of Markov chains to model long-term temporal correlation across drawing states. Gaussian process regression was used for universal function approximation to flexibly infer the state transition function. With Gaussian process prior to the distribution of function space, we could encode high-level function properties such as noisiness, smoothness and periodicity. We also derived an efficient training mechanism for complex Gaussian process regression on bivariate Markov chains. With GPMC, we present an optimal decision rule based on Bayesian decision theory. We applied our proposed method to a drawing test for dementia screening, i.e. interlocking pentagon-drawing test. We tested our models with 256 subjects who are aged from 65 to 95. Finally, comparing to the traditional methods, our models showed remarkable improvement in drawing test for dementia screening

    Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model

    Full text link
    Expressive human speech generally abounds with rich and flexible speech prosody variations. The speech prosody predictors in existing expressive speech synthesis methods mostly produce deterministic predictions, which are learned by directly minimizing the norm of prosody prediction error. Its unimodal nature leads to a mismatch with ground truth distribution and harms the model's ability in making diverse predictions. Thus, we propose a novel prosody predictor based on the denoising diffusion probabilistic model to take advantage of its high-quality generative modeling and training stability. Experiment results confirm that the proposed prosody predictor outperforms the deterministic baseline on both the expressiveness and diversity of prediction results with even fewer network parameters.Comment: Proceedings of Interspeech 2023 (doi: 10.21437/Interspeech.2023-715), demo site at https://thuhcsi.github.io/interspeech2023-DiffVar

    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

    Get PDF
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.Comment: Accepted by IJCAI 202
    corecore