8 research outputs found
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Recently, excellent progress has been made in speech recognition. However,
pure data-driven approaches have struggled to solve the problem in
domain-mismatch and long-tailed data. Considering that knowledge-driven
approaches can help data-driven approaches alleviate their flaws, we introduce
sememe-based semantic knowledge information to speech recognition (SememeASR).
Sememe, according to the linguistic definition, is the minimum semantic unit in
a language and is able to represent the implicit semantic information behind
each word very well. Our experiments show that the introduction of sememe
information can improve the effectiveness of speech recognition. In addition,
our further experiments show that sememe knowledge can improve the model's
recognition of long-tailed data and enhance the model's domain generalization
ability.Comment: Accepted by INTERSPEECH 202
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Mapping two modalities, speech and text, into a shared representation space,
is a research topic of using text-only data to improve end-to-end automatic
speech recognition (ASR) performance in new domains. However, the length of
speech representation and text representation is inconsistent. Although the
previous method up-samples the text representation to align with acoustic
modality, it may not match the expected actual duration. In this paper, we
proposed novel representations match strategy through down-sampling acoustic
representation to align with text modality. By introducing a continuous
integrate-and-fire (CIF) module generating acoustic representations consistent
with token length, our ASR model can learn unified representations from both
modalities better, allowing for domain adaptation using text-only data of the
target domain. Experiment results of new domain data demonstrate the
effectiveness of the proposed method.Comment: Accepted by INTERSPEECH 2023. arXiv admin note: text overlap with
arXiv:2309.0143
PREDICT URBAN AIR POLLUTION IN SURABAYA USING RECURRENT NEURAL NETWORK – LONG SHORT TERM MEMORY
Air is one of the primary needs of living things. If the condition of air is polluted, then the lives of humans and other living things will be disrupted. So it is needed to perform special handling to maintain air quality. One way to facilitate the prevention of air pollution is to make air pollutionforecasting by utilizing past data. Through the Environmental Office, the Surabaya City Government has monitored air quality in Surabaya every 30 minutes for various air quality parameters including CO, NO, NO2, NOx, PM10, SO2 and meteorological data such as wind direction, wind direction, wind speed, wind speed, global radiation, humidity, and air temperature. These data are very useful to build a prediction model for the forecast of air pollution in the future. With the large amount and variance of data generated from monitoring air quality in Surabaya city, a qualified algorithm is needed to process it. One algorithm that can be used is Recurrent Neural Network - Long Short Term Memory (RNN-LSTM). RNN-LSTM is built for sequential data processing such as time-series data. In this study, several analyses are performed. There are trend analysis, correlation analysis of pollutant values to meteorological data, and predictions of carbon monoxide pollutants using the Recurrent Neural Network - LSTM in the city of Surabaya correlated with meteorological data. The results of this study indicate that the best prediction model using RNN-LSTM with RMSE calculation gets an error of 1,880 with the number of hidden layer 2 and epoch 50 scenarios. The predicted results built can be used as a reference in determining the policy of the city government to deal with air pollution going forward