113 research outputs found

    Optimized EWT-Seq2Seq-LSTM with Attention Mechanism to Insulators Fault Prediction

    Get PDF
    Insulators installed outdoors are vulnerable to the accumulation of contaminants on their surface, which raise their conductivity and increase leakage current until a flashover occurs. To improve the reliability of the electrical power system, it is possible to evaluate the development of the fault in relation to the increase in leakage current and thus predict a shutdown might occur. This paper proposes the use of empirical wavelet transform (EWT) to reduce the influence of non-representative variations and combines the attention mechanism with long short-term memory (LSTM) recurrent network for prediction. The Optuna framework has been applied for hyperparameter optimization, resulting in a method called Optimized EWT-Seq2Seq-LSTM with Attention. The proposed model had a 10.17% lower mean square error (MSE) than the standard LSTM and a 5.36% lower MSE than the model without optimization, showing that the attention mechanism and hyperparameter optimization is a promising strategy

    Streaming Audio-Visual Speech Recognition with Alignment Regularization

    Full text link
    Recognizing a word shortly after it is spoken is an important requirement for automatic speech recognition (ASR) systems in real-world scenarios. As a result, a large body of work on streaming audio-only ASR models has been presented in the literature. However, streaming audio-visual automatic speech recognition (AV-ASR) has received little attention in earlier works. In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture. The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution. Streaming recognition with a decoder neural network is realized by using the triggered attention technique, which performs time-synchronous decoding with joint CTC/attention scoring. For frame-level ASR criteria, such as CTC, a synchronized response from the audio and visual encoders is critical for a joint AV decision making process. In this work, we propose a novel alignment regularization technique that promotes synchronization of the audio and visual encoder, which in turn results in better word error rates (WERs) at all SNR levels for streaming and offline AV-ASR models. The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 (LRS3) dataset in an offline and online setup, respectively, which both present state-of-the-art results when no external training data are used.Comment: Submitted to ICASSP202

    Learning representations of multivariate time series with missing data

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordLearning compressed representations of multivariate time series (MTS) facilitates data analysis in the presence of noise and redundant information, and for a large number of variates and time steps. However, classical dimensionality reduction approaches are designed for vectorial data and cannot deal explicitly with missing values. In this work, we propose a novel autoencoder architecture based on recurrent neural networks to generate compressed representations of MTS. The proposed model can process inputs characterized by variable lengths and it is specifically designed to handle missing data. Our autoencoder learns fixed-length vectorial representations, whose pairwise similarities are aligned to a kernel function that operates in input space and that handles missing values. This allows to learn good representations, even in the presence of a significant amount of missing data. To show the effectiveness of the proposed approach, we evaluate the quality of the learned representations in several classification tasks, including those involving medical data, and we compare to other methods for dimensionality reduction. Successively, we design two frameworks based on the proposed architecture: one for imputing missing data and another for one-class classification. Finally, we analyze under what circumstances an autoencoder with recurrent layers can learn better compressed representations of MTS than feed-forward architectures.Norwegian Research Counci

    Comparing Neural Meaning-to-Text Approaches for Dutch

    Get PDF
    The neural turn in computational linguistics has made it relatively easy to build systems for natural language generation, as long as suitable annotated corpora are available. But can such systems deliver the goods? Using Dutch data of the Parallel Meaning Bank, a corpus of (mostly short) texts annotated with language-neutral meaning representations, we investigate what challenges arise and what choices can be made when implementing sequence-to-sequence or graphto- sequence transformer models for generating Dutch texts from formal meaning representations. We compare the performance of linearized input graphs with graphs encoded in various formats and find that stacking encoders obtain the best results for the standard metrics used in natural language generation. A key challenge is dealing with unknown tokens that occur in the input meaning representation. We introduce a new method based on WordNet similarity to deal with out-of-vocab concepts
    corecore