1,203 research outputs found
Identification of T-cell receptor a-chain genes in the chicken
T-cell receptor (TCR) -chain (TCR) and ß-chain (TCRß) genes are well characterized in mammals, while only TCRß genes have been identified in other vertebrates. To identify avian TCR genes, we used monoclonal anti-CD3 antibodies to isolate chicken TCR for peptide sequence analysis. Degenerate oligonucleotide probes were then used to isolate a candidate TCR cDNA clone that hybridized with a 1.7-kb mRNA species present only in ß T cells and in tissues populated by these cells. Southern blot analysis revealed gene rearrangement in thymocytes and ß T-cell lines. The TCR cDNA candidate encoded an openreading frame of 275 amino acids, the predicted variable (V)-, joining (J)-, and constant (C)-region amino acid sequences of which shared 40%, 60%, and 25% homology with corresponding mammalian sequences. A single C gene and 25 V genes were identified by using region-specific probes. The V cDNA probe isolated from a Vß1+ cell line reacted with transcripts from one of five Vß2+ cell lines, suggesting shared use of V genes by Vß1+ and Vß2+ T cells and the existence of other V gene families. A genomic V sequence was flanked by classical recombination signal sequences but, unlike previously defined V genes, the leader and V region were encoded by a single exon. The data indicate evolutionary conservation of the basic TCR gene structure in birds and mammal
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect
We study the cocktail party problem and propose a novel attention network
called Tune-In, abbreviated for training under negative environments with
interference. It firstly learns two separate spaces of speaker-knowledge and
speech-stimuli based on a shared feature space, where a new block structure is
designed as the building block for all spaces, and then cooperatively solves
different tasks. Between the two spaces, information is cast towards each other
via a novel cross- and dual-attention mechanism, mimicking the bottom-up and
top-down processes of a human's cocktail party effect. It turns out that
substantially discriminative and generalizable speaker representations can be
learnt in severely interfered conditions via our self-supervised training. The
experimental results verify this seeming paradox. The learnt speaker embedding
has superior discriminative power than a standard speaker verification method;
meanwhile, Tune-In achieves remarkably better speech separation performances in
terms of SI-SNRi and SDRi consistently in all test modes, and especially at
lower memory and computational consumption, than state-of-the-art benchmark
systems.Comment: Accepted in AAAI 202
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation
One of the leading single-channel speech separation (SS) models is based on a
TasNet with a dual-path segmentation technique, where the size of each segment
remains unchanged throughout all layers. In contrast, our key finding is that
multi-granularity features are essential for enhancing contextual modeling and
computational efficiency. We introduce a self-attentive network with a novel
sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA)
SS performance at significantly smaller model size and computational cost.
Forward along each block inside Sandglasset, the temporal granularity of the
features gradually becomes coarser until reaching half of the network blocks,
and then successively turns finer towards the raw signal level. We also unfold
that residual connections between features with the same granularity are
critical for preserving information after passing through the bottleneck layer.
Experiments show our Sandglasset with only 2.3M parameters has achieved the
best results on two benchmark SS datasets -- WSJ0-2mix and WSJ0-3mix, where the
SI-SNRi scores have been improved by absolute 0.8 dB and 2.4 dB, respectively,
comparing to the prior SOTA results.Comment: Accepted in ICASSP 202
Contrastive Separative Coding for Self-supervised Representation Learning
To extract robust deep representations from long sequential modeling of
speech data, we propose a self-supervised learning approach, namely Contrastive
Separative Coding (CSC). Our key finding is to learn such representations by
separating the target signal from contrastive interfering signals. First, a
multi-task separative encoder is built to extract shared separable and
discriminative embedding; secondly, we propose a powerful cross-attention
mechanism performed over speaker representations across various interfering
conditions, allowing the model to focus on and globally aggregate the most
critical information to answer the "query" (current bottom-up embedding) while
paying less attention to interfering, noisy, or irrelevant parts; lastly, we
form a new probabilistic contrastive loss which estimates and maximizes the
mutual information between the representations and the global speaker vector.
While most prior unsupervised methods have focused on predicting the future,
neighboring, or missing samples, we take a different perspective of predicting
the interfered samples. Moreover, our contrastive separative loss is free from
negative sampling. The experiment demonstrates that our approach can learn
useful representations achieving a strong speaker verification performance in
adverse conditions.Comment: Accepted in ICASSP 202
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)
Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis
Drawing-Based Automatic Dementia Screening Using Gaussian Process Markov Chains
Screening tests play an important role for early detection of dementia. Among those widely used screening tests, drawing tests have gained much attention in clinical psychology. Traditional evaluation of drawing tests totally relies on the appearance of drawn picture, but does not consider any time-dependent behaviour. We demonstrated that the processing speed and direction can reflect the decline of cognitive function, and thus may be useful for disease screening. We proposed a model of Gaussian process Markov chains (GPMC) to study the complex associations within the drawing data. Specifically, we modeled the process of drawing in a state-space form, where a drawing state is composed of drawing direction and velocity with consideration of the processing time. For temporal modeling, our scope focused more on discrete-time Markov chains on continuous state space. Because of the short processing time of picture drawing, we applied higher-order of Markov chains to model long-term temporal correlation across drawing states. Gaussian process regression was used for universal function approximation to flexibly infer the state transition function. With Gaussian process prior to the distribution of function space, we could encode high-level function properties such as noisiness, smoothness and periodicity. We also derived an efficient training mechanism for complex Gaussian process regression on bivariate Markov chains. With GPMC, we present an optimal decision rule based on Bayesian decision theory. We applied our proposed method to a drawing test for dementia screening, i.e. interlocking pentagon-drawing test. We tested our models with 256 subjects who are aged from 65 to 95. Finally, comparing to the traditional methods, our models showed remarkable improvement in drawing test for dementia screening
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Expressive human speech generally abounds with rich and flexible speech
prosody variations. The speech prosody predictors in existing expressive speech
synthesis methods mostly produce deterministic predictions, which are learned
by directly minimizing the norm of prosody prediction error. Its unimodal
nature leads to a mismatch with ground truth distribution and harms the model's
ability in making diverse predictions. Thus, we propose a novel prosody
predictor based on the denoising diffusion probabilistic model to take
advantage of its high-quality generative modeling and training stability.
Experiment results confirm that the proposed prosody predictor outperforms the
deterministic baseline on both the expressiveness and diversity of prediction
results with even fewer network parameters.Comment: Proceedings of Interspeech 2023 (doi: 10.21437/Interspeech.2023-715),
demo site at https://thuhcsi.github.io/interspeech2023-DiffVar
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Denoising diffusion probabilistic models (DDPMs) have recently achieved
leading performances in many generative tasks. However, the inherited iterative
sampling process costs hindered their applications to speech synthesis. This
paper proposes FastDiff, a fast conditional diffusion model for high-quality
speech synthesis. FastDiff employs a stack of time-aware location-variable
convolutions of diverse receptive field patterns to efficiently model long-term
time dependencies with adaptive conditions. A noise schedule predictor is also
adopted to reduce the sampling steps without sacrificing the generation
quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer,
FastDiff-TTS, which generates high-fidelity speech waveforms without any
intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff
demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech
samples. Also, FastDiff enables a sampling speed of 58x faster than real-time
on a V100 GPU, making diffusion models practically applicable to speech
synthesis deployment for the first time. We further show that FastDiff
generalized well to the mel-spectrogram inversion of unseen speakers, and
FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech
synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.Comment: Accepted by IJCAI 202
- …