166 research outputs found

    A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

    Full text link
    In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language. The backbone of SM2 is Transformer Transducer, which has high streaming capability. Instead of human labeled speech translation (ST) data, SM2 models are trained using weakly supervised data generated by converting the transcriptions in speech recognition corpora with a machine translation service. With 351 thousand hours of anonymized speech training data from 25 languages, SM2 models achieve comparable or even better ST quality than some recent popular large-scale non-streaming speech models. More importantly, we show that SM2 has the truly zero-shot capability when expanding to new target languages, yielding high quality ST results for {source-speech, target-text} pairs that are not seen during training.Comment: submitted to ICASSP 202

    Gas-liquid Two-phase Flow Measurement Using Coriolis Flowmeters Incorporating Artificial Neural Network, Support Vector Machine and Genetic Programming Algorithms

    Get PDF
    Coriolis flowmeters are well established for the mass flow measurement of single phase flow with high accuracy. In recent years attempts have been made to apply Coriolis flowmeters to measure two-phase flow. This paper presents data driven models that are incorporated in Coriolis flowmeters to measure both the liquid mass flowrate and the gas volume fraction of a two-phase flow mixture. Experimental work was conducted on a purpose-built two-phase flow test rig on both horizontal and vertical pipelines for a liquid mass flowrate ranging from 700 kg/h to 14500 kg/h and a gas volume fraction between 0 and 30%. Artificial Neural Network (ANN), Support Vector Machine (SVM) and Genetic Programming (GP) models are established through training with experimental data. The performance of BP-ANN (Back Propagation - ANN), RBF-ANN (Radial Basis Function - ANN), SVM and GP models is assessed and compared. Experimental results suggest that the SVM models are superior to the BP-ANN, RBF-ANN and GP models for two-phase flow measurement in terms of robustness and accuracy. For liquid mass flowrate measurement with the SVM models, 93.49% of the experimental data yield a relative error less than ±1% on the horizontal pipeline whilst 96.17% of the results are within ±1% on the vertical installation. The SVM models predict gas volume fraction with a relative error less than ±10% for 93.10% and 94.25% of the test conditions on horizontal and vertical installations, respectively

    Gas-liquid Two-phase Flow Measurement Using Coriolis Flowmeters Incorporating Neural Networks

    Get PDF
    Coriolis flowmeters are commonly used to measure single phase flow. In recent years attempts are being made to apply Coriolis flowmeters to measure two-phase flows. This paper presents a neural network based approach that has been applied to Coriolis flowmeters to measure both the liquid flow rate and the gas void fraction of a two-phase flow. Experimental tests were conducted on a purpose-built two-phase flow test rig on both horizontal and vertical pipelines. The mass flow rate ranges from 700 kg/h to 14500 kg/h whilst the gas volume fraction is between 0 and 30%. A set of variables, including observed density, apparent mass flow, pressure of the fluid and signals to maintain flow tube oscillation, are considered as inputs to a neural network. Two neural networks are established through training with experimental data obtained from the flow rig on horizontal and vertical pipelines, respectively. The performance of both neural networks is assessed in comparison with the reference readings. Experimental results suggest that the relative errors of the corrected mass flow rate of liquid for the vertical and horizontal installations are no greater than ±1.5% and ±2.5%, respectively. The gas volume fraction is predicted with relative errors of less than ±10% and ±20%, respectively, for vertical and horizontal installations in most cases

    Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

    Full text link
    In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transformer-Transducer that jointly generates automatic speech recognition (ASR) and speech translation (ST) outputs using a single decoder. To produce ASR and ST content effectively with minimal latency, we propose a joint token-level serialized output training method that interleaves source and target words by leveraging an off-the-shelf textual aligner. Experiments in monolingual (it-en) and multilingual (\{de,es,it\}-en) settings demonstrate that our approach achieves the best quality-latency balance. With an average ASR latency of 1s and ST latency of 1.3s, our model shows no degradation or even improves output quality compared to separate ASR and ST models, yielding an average improvement of 1.1 WER and 0.4 BLEU in the multilingual case

    Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

    Full text link
    The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in computational resources, and increased synchronization complexity in real time. In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. We introduce a novel method for joint token-level serialized output training based on timestamp information to effectively produce ASR and ST outputs in the streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.Comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Flow-State Identification of Oil-Based Magnetic Fluid Seal Based on Acoustic Emission Technology

    Get PDF
    At present, most research studies on the changing process of the magnetic fluid seal are analyzed with the pressure signal of each chamber or the magnetic fluid flow photos taken by a camera, which need to change the seal structure. Based on nondestructive acoustic emission technology, a flow-state identification model of the oil-based magnetic fluid seal using the grey wolf optimizer and random forest is proposed in this study. The acoustic emission signal and pressure signal are collected at the same time under static conditions in the two-stage pole shoes oil-based magnetic fluid seal experiment. Through power spectrum analysis of the acoustic emission signal with the aid of pressure signal, the changing process before seal failure is divided into three states: no magnetic fluid flow, the first pole shoe magnetic fluid flow, and two pole shoes magnetic fluid flow together. Then, the time- and frequency-domain features of acoustic emission signal samples are extracted to form feature vectors as inputs, and the flow-state identification model is established based on the grey wolf optimizer and random forest. The experimental results show that the testing accuracy and F1 scores (the index representing the precision and recall at the same weight) of three states are close to or higher than 90%. The effectiveness of oil-based magnetic fluid seal flow-state identification model based on non-destructive acoustic emission technology is proved

    Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

    Full text link
    Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State Transducers (WFST) have been employed to do ITN. WFSTs are nicely suited to this task but their size and run-time costs can make deployment on embedded applications challenging. In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. At the core of our system is a streaming transformer tagger, that tags lexical tokens from ASR. The tag informs which ITN category might be applied, if at all. Following that, we apply an ITN-category-specific WFST, only on the tagged text, to reliably perform the ITN conversion. We show that the proposed ITN solution performs equivalent to strong baselines, while being significantly smaller in size and retaining customization capabilities.Comment: 8 pages. 6 page paper 2 page reference

    LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

    Full text link
    End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it easy to use a single model for both multilingual ASR and many-to-many ST. In this paper, we propose streaming language-agnostic multilingual speech recognition and translation using neural transducers (LAMASSU). To enable multilingual text generation in LAMASSU, we conduct a systematic comparison between specified and unified prediction and joint networks. We leverage a language-agnostic multilingual encoder that substantially outperforms shared encoders. To enhance LAMASSU, we propose to feed target LID to encoders. We also apply connectionist temporal classification regularization to transducer training. Experimental results show that LAMASSU not only drastically reduces the model size but also outperforms monolingual ASR and bilingual ST models.Comment: Submitted to ICASSP 202

    Investigations into the behaviours of Coriolis flowmeters under air-water two-phase flow conditions on an optimized experimental platform

    Get PDF
    Gas-liquid two-phase flow is commonly encountered in many industrial processes due to production requirement or inevitable gas entrainment from various sources. Accurate liquid phase measurement under two-phase conditions is challenging but important as it is the key factor to reduce cost, improve safety or meet legal requirements. Coriolis flowmeters, owing to their high accuracy in metering single-phase flow, direct mass flow measurement and multivariable sensing nature, are widely used in industry. Recently developed Coriolis flowmeters can work under multiphase conditions, making it possible to achieve accurate multiphase flow measurement through model based error compensation or training based soft computing correction. This paper assesses the behaviours of Coriolis flowmeters under various two-phase conditions for modelling and soft computing algorithm improvement, including previously investigated factors (flowrate, gas volume fraction, flow tube geometry, flow converter, and process pressure) and new factors (flow regimes in terms of bubble size and distribution). Experimental work was conducted on 25 mm and 50 mm bore air-water two-phase flow rigs for liquid mass flowrates between 2500 kg/h and 35000 kg/h with gas volume fraction of 0-60%. With the influence of each factor identified through univariate analysis, comparisons between existing modelling theories and experimental error curves are established. In the meantime, the rig design and control are optimized to provide efficient and automated data acquisition in order to supply ample and high-quality data for the training of soft computing models as well as enhancing the understanding in theoretical modelling

    Mass flow measurement of two-phase carbon dioxide using Coriolis flowmeters

    Get PDF
    Carbon Capture and Storage (CCS) is considered as an important technology to reduce CO2 emission from electrical power generation and other industrial processes. In the CCS chain, i.e. from capture to storage via transportation, it is essential to realize accurate measurement of CO2 flows for the purpose of accounting and potential leakage detection. However, there are some significant challenges for the current flow metering technologies to achieve the specified 1.5% measurement uncertainty in the EU-ETS (European Union - Emissions Trading Scheme) for all expected flow conditions. Moreover, there are very few CO2 flow test and calibration facilities that can recreate CCS conditions particularly two-phase CO2 flow in pipelines together with accurate measurement standards. As one of the most potential flowmeters that may be used in the CCS chain, Coriolis flowmeters have the advantages of direct measurement of mass flow rate regardless of its state (liquid, gas, gas/liquid two-phase or supercritical) in addition to the measurement of temperature and density of CO2 for the characterization of flow conditions. This paper assesses the performance of Coriolis flowmeters incorporating a soft-computing correction method for gas-liquid two-phase CO2 flow measurement. The correction method includes a pre-trained backpropagation neural network. Experimental work was conducted on a purpose-built 25 mm bore two-phase CO2 flow test rig for liquid mass flowrate between 300 kg/h and 3050 kg/h and gas mass flowrate from 0 to 330 kg/h under the fluid temperature of 19~21 °C and pressure of 54~58 bar. Experimental results suggest that the Coriolis flowmeters with the developed correction method are capable of providing the mass flow rate of gas-liquid CO2 flow with errors mostly within ±2% and ±1.5% on horizontal and vertical pipelines, respectively
    corecore