432 research outputs found
Consecutive Decoding for Speech-to-text Translation
Speech-to-text translation (ST), which directly translates the source
language speech to the target language text, has attracted intensive attention
recently. However, the combination of speech recognition and machine
translation in a single model poses a heavy burden on the direct cross-modal
cross-lingual mapping. To reduce the learning difficulty, we propose
COnSecutive Transcription and Translation (COSTT), an integral approach for
speech-to-text translation. The key idea is to generate source transcript and
target translation text with a single decoder. It benefits the model training
so that additional large parallel text corpus can be fully exploited to enhance
the speech translation training. Our method is verified on three mainstream
datasets, including Augmented LibriSpeech English-French dataset, TED
English-German dataset, and TED English-Chinese dataset. Experiments show that
our proposed COSTT outperforms the previous state-of-the-art methods. The code
is available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 2021. arXiv admin note: text overlap with
arXiv:2009.0970
Recommended from our members
A Simple Graphene NH₃ Gas Sensor via Laser Direct Writing.
Ammonia gas sensors are very essential in many industries and everyday life. However, their complicated fabrication process, severe environmental fabrication requirements and desorption of residual ammonia molecules result in high cost and hinder their market acceptance. Here, laser direct writing is used to fabricate three parallel porous 3D graphene lines on a polyimide (PI) tape to simply construct an ammonia gas sensor. The middle one works as an ammonia sensing element and the other two on both sides work as heaters to improve the desorption performance of the sensing element to ammonia gas molecules. The graphene lines were characterized by scanning electron microscopy and Raman spectroscopy. The response and recovery time of the sensor without heating are 214 s and 222 s with a sensitivity of 0.087% ppm-1 for sensing 75 ppm ammonia gas, respectively. The experimental results prove that under the optimized heating temperature of about 70 °C the heaters successfully help implement complete desorption of residual NH₃ showing a good sensitivity and cyclic stability
Automatic Speaker Identification System for Urdu Speech
Speaker recognition is the process of recognizing a speaker from a verbal phrase. Such systems generally operates in two ways: to identify a speaker or to verify speaker’s claimed identity. Availability of valuable research material witnessed efforts paid to Automatic Speaker Identification (ASI) in East Asian, English and European languages. But unfortunately languages of South Asia especially “Urdu” have got very less attention. This paper aims to describe a new feature set for ASI in Urdu speech, achieving improved performance than baseline systems. Classifiers like Neural Net, Naïve Bayes and K nearest neighbor (K-NN) have been used for modeling. Results are provided on the dataset of 40 speakers with 82% correct identification. Lastly, improvement in system performance is also reported by changing number of recordings per speaker
- …