262 research outputs found

    A general overview of cyclic transmultiplexers with cyclic modulation: Implementation and angular parametrization.

    Get PDF
    31 pages.This preprint provides a general framework for cyclic transmultiplexers (TMUXs) with cyclic modulation. This TMUX also corresponds to a multicarrier modulation system of the Filtered MultiTone (FMT) type where the linear convolution is replaced by a cyclic one, hence the name Cyclic Block FMT (CB-FMT). In this preprint we present the Perfect Reconstruction (PR) conditions in the time and frequency domains. A duality theorem is proved showing that each PR solution in the frequency domain is connected to a dual PR solution in the time domain. Then, two decomposition theorems are established leading to modular implementations of the cyclic TMUX. For one of this implementation we provide an angular parametrization that only involves angles corresponding to independent parameters. Finally, a procedure to reconstruct the prototype function from all the elementary blocks of the modular implementation is described step-by step

    A simple commutativity condition for block decimators and expanders

    Get PDF
    5 pages.International audienceCommutativity rules are essential for building multirate signal processing systems. In this short and self-contained paper, we focus on theinterchangeability of block decimators and expanders. We, formally, prove that commutativity between these two operators is possible if and only if the data blocks are of an equal length corresponding to the greatest common divisor of the integer decimation and expansion factors

    Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video

    Full text link
    Audio-visual automatic speech recognition (AV-ASR) extends speech recognition by introducing the video modality as an additional source of information. In this work, the information contained in the motion of the speaker's mouth is used to augment the audio features. The video modality is traditionally processed with a 3D convolutional neural network (e.g. 3D version of VGG). Recently, image transformer networks arXiv:2010.11929 demonstrated the ability to extract rich visual features for image classification tasks. Here, we propose to replace the 3D convolution with a video transformer to extract visual features. We train our baselines and the proposed model on a large scale corpus of YouTube videos. The performance of our approach is evaluated on a labeled subset of YouTube videos as well as on the LRS3-TED public corpus. Our best video-only model obtains 31.4% WER on YTDEV18 and 17.0% on LRS3-TED, a 10% and 15% relative improvements over our convolutional baseline. We achieve the state of the art performance of the audio-visual recognition on the LRS3-TED after fine-tuning our model (1.6% WER). In addition, in a series of experiments on multi-person AV-ASR, we obtained an average relative reduction of 2% over our convolutional video frontend.Comment: 5 pages, 3 figures, published at Interspeech 202

    Cascaded encoders for fine-tuning ASR models on overlapped speech

    Full text link
    Multi-talker speech recognition (MT-ASR) has been shown to improve ASR performance on speech containing overlapping utterances from more than one speaker. Multi-talker models have typically been trained from scratch using simulated or actual overlapping speech datasets. On the other hand, the trend in ASR has been to train foundation models using massive datasets collected from a wide variety of task domains. Given the scale of these models and their ability to generalize well across a variety of domains, it makes sense to consider scenarios where a foundation model is augmented with multi-talker capability. This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration. Experimental results show that the cascade configuration provides improved WER on overlapping speech utterances with respect to a baseline multi-talker model without sacrificing performance achievable by the foundation model on non-overlapping utterances

    The Alamouti Scheme with CDMA-OFDM/OQAM

    Get PDF
    This paper deals with the combination of OFDM/OQAM with the Alamouti scheme. After a brief presentation of the OFDM/OQAM modulation scheme, we introduce the fact that the well-known Alamouti decoding scheme cannot be simply applied to this modulation. Indeed, the Alamouti coding scheme requires a complex orthogonality property; whereas OFDM/OQAM only provides real orthogonality. However, as we have recently shown, under some conditions, a transmission scheme combining CDMA and OFDM/OQAM can satisfy the complex orthogonality condition. Adding a CDMA component can thus be seen as a solution to apply the Alamouti scheme in combination with OFDM/OQAM. However, our analysis shows that the CDMA-OFDM/OQAM combination has to be built taking into account particular features of the transmission channel. Our simulation results illustrate the 2×1 Alamouti coding scheme for which CDMA-OFDM/OQAM and CP-OFDM are compared in two different scenarios: (i) CDMA is performed in the frequency domain, (ii) CDMA is performed in time domain

    On the study of faster-than-Nyquist multicarrier signaling based on frame theory

    Get PDF
    Multicarrier transmissions are classically based on undercomplete or exact Weyl-Heisenberg Riesz (biorthogonal or orthogonal) bases implemented thanks to oversampled filter-banks. This can be seen as a transmission below the Nyquist rate. However, when overcomplete Weyl-Heisenberg frames are used, we obtain a “faster-than-Nyquist” (FTN) system and it is theoretically impossible to recover exactly transmitted symbols using a linear receiver. Various studies have shown the interest of this high density signaling scheme as well as practical implementations based on trellis and/or iterative decoding. Nevertheless, there is still a lack of theoretical justifications with regard to pulse design in the FTN case. In this paper, we consider a linear transceiver operating over an additive white Gaussian noise channel. Using the frame theory and simulation results, we show that the mean squared error (MSE) is minimized when tight frames are used

    FTN multicarrier transmission based on tight Gabor frames

    Get PDF
    A multicarrier signal can be synthesized thanks to a symbol sequence and a Gabor family (i.e., a regularly time-frequency shifted version of a generator pulse). In this article, we consider the case where the signaling density is increased such that inter-pulse interference is unavoidable.Over an additive white Gaussian noise channel, we show that the signal-to-interference-plus-noise ratio is maximized when the transmitter and the receiver use the same tight Gabor frame. What is more, we give practical efficient realization schemes and show how to build tight frames based on usual generators. Theoretical and simulated bit-error-probability are given for a non-coded system using quadrature amplitude modulations. Such a characterization is then used to predict the convergence of a coded system using low-density parity-check codes. We also study the robustness of such a system to errors on the received bits in an interference cancellation context

    Analysis of a FTN Multicarrier System: Interference Mitigation Based on Tight Gabor Frames

    Get PDF
    Cognitive radio applications require flexible waveforms to overcome several challenges such as opportunistic spectrum allocation and white spaces utilization. In this context, multicarrier modulations generalizing traditional cyclic-prefix orthogonal frequency-division multiplexing are particularly justified to fit time-frequency characteristics of the channel while improving spectral efficiency.In our theoretical framework, a multicarrier signal is described as a Gabor family the coefficients of which are the symbols to be transmitted and the generators are the time-frequency shifted pulse shapes to be used. In this article, we consider the case where non-rectangular pulse shapes are used with a signaling density increased such that inter-pulse interference is unavoidable. Such an interference is minimized when the Gabor family used is a tight frame. We show that, in this case, interference can be approximated as an additive Gaussian noise. This allows us to compute theoretical and simulated bit-error-probability for a non-coded system using a quadrature phase-shift keying constellation. Such a characterization is then used in order to predict the convergence of a coded system using low-density parity check codes. We also study the robustness of such a system to errors on the received bits in an interference cancellation context

    Guard Interval Adaptation for In-home Power Line Communication

    No full text
    International audienceThis paper aims to analyze the choice of the guard interval (GI) length in PLC systems to optimize the achievable throughput under power and symbol error-rate (SER) constraints. In general, the GI length is chosen so that there is no interference, i.e. the GI length is greater than or equal to the channel impulse response length. However, many previous works have shown that in PLC systems, this GI choice is inefficient in terms of achievable throughput. Indeed, shorter GI evidently results in inter-symbol interference (ISI) and intercarrier interference (ICI), but the gain offered by shortened GI may exceed the loss caused by interference. In this paper, we propose a simple solution for the GI length adaptation in PLC systems to optimize the achievable throughput
    • …
    corecore