Search CORE

8 research outputs found

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Author: Meng Helen
Song Changhe
Wu Zhiyong
Zhu Jiaxu
Publication venue
Publication date: 04/09/2023
Field of study

Recently, excellent progress has been made in speech recognition. However, pure data-driven approaches have struggled to solve the problem in domain-mismatch and long-tailed data. Considering that knowledge-driven approaches can help data-driven approaches alleviate their flaws, we introduce sememe-based semantic knowledge information to speech recognition (SememeASR). Sememe, according to the linguistic definition, is the minimum semantic unit in a language and is able to represent the implicit semantic information behind each word very well. Our experiments show that the introduction of sememe information can improve the effectiveness of speech recognition. In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data and enhance the model's domain generalization ability.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Author: Meng Helen
Song Changhe
Su Dan
Tong Weinan
Wu Zhiyong
Xu Yaoxun
You Zhao
Yu Dong
Zhu Jiaxu
Publication venue
Publication date: 04/09/2023
Field of study

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation and text representation is inconsistent. Although the previous method up-samples the text representation to align with acoustic modality, it may not match the expected actual duration. In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality. By introducing a continuous integrate-and-fire (CIF) module generating acoustic representations consistent with token length, our ASR model can learn unified representations from both modalities better, allowing for domain adaptation using text-only data of the target domain. Experiment results of new domain data demonstrate the effectiveness of the proposed method.Comment: Accepted by INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2309.0143

arXiv.org e-Print Archive

PREDICT URBAN AIR POLLUTION IN SURABAYA USING RECURRENT NEURAL NETWORK – LONG SHORT TERM MEMORY

Author: Endroyono Endroyono
Faishol Muh. Anas
Irfansyah Astria Nur
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 31/07/2020
Field of study

Air is one of the primary needs of living things. If the condition of air is polluted, then the lives of humans and other living things will be disrupted. So it is needed to perform special handling to maintain air quality. One way to facilitate the prevention of air pollution is to make air pollutionforecasting by utilizing past data. Through the Environmental Office, the Surabaya City Government has monitored air quality in Surabaya every 30 minutes for various air quality parameters including CO, NO, NO2, NOx, PM10, SO2 and meteorological data such as wind direction, wind direction, wind speed, wind speed, global radiation, humidity, and air temperature. These data are very useful to build a prediction model for the forecast of air pollution in the future. With the large amount and variance of data generated from monitoring air quality in Surabaya city, a qualified algorithm is needed to process it. One algorithm that can be used is Recurrent Neural Network - Long Short Term Memory (RNN-LSTM). RNN-LSTM is built for sequential data processing such as time-series data. In this study, several analyses are performed. There are trend analysis, correlation analysis of pollutant values to meteorological data, and predictions of carbon monoxide pollutants using the Recurrent Neural Network - LSTM in the city of Surabaya correlated with meteorological data. The results of this study indicate that the best prediction model using RNN-LSTM with RMSE calculation gets an error of 1,880 with the number of hidden layer 2 and epoch 50 scenarios. The predicted results built can be used as a reference in determining the policy of the city government to deal with air pollution going forward

JUTI: Jurnal Ilmiah Teknologi Informasi