75 research outputs found
Unsupervised Neural Machine Translation with SMT as Posterior Regularization
Without real bilingual corpus available, unsupervised Neural Machine
Translation (NMT) typically requires pseudo parallel data generated with the
back-translation method for the model training. However, due to weak
supervision, the pseudo data inevitably contain noises and errors that will be
accumulated and reinforced in the subsequent training process, leading to bad
translation performance. To address this issue, we introduce phrase based
Statistic Machine Translation (SMT) models which are robust to noisy data, as
posterior regularizations to guide the training of unsupervised NMT models in
the iterative back-translation process. Our method starts from SMT models built
with pre-trained language models and word-level translation tables inferred
from cross-lingual embeddings. Then SMT and NMT models are optimized jointly
and boost each other incrementally in a unified EM framework. In this way, (1)
the negative effect caused by errors in the iterative back-translation process
can be alleviated timely by SMT filtering noises from its phrase tables;
meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in
SMT. Experiments conducted on en-fr and en-de translation tasks show that our
method outperforms the strong baseline and achieves new state-of-the-art
unsupervised machine translation performance.Comment: To be presented at AAAI 2019; 9 pages, 4 figure
Measuring Rural Poverty in China: a Case Study Approach
This paper measures rural poverty in Hubei Province and Inner Mongolia in China. The poverty lines we derived by Ravallion's method differ from the official Chinese poverty lines. The official pan-country poverty line underestimates rural poverty in Hubei Province and overestimates rural poverty in Inner Mongolia. Poverty determinants are estimated by Logit as well as Probit models. The study notes that factors such as living in a mountainous area, lack of better irrigation conditions, a large family size, few fixed assets, few land owned and sole dependence on agriculture as a livelihood source would make a rural household more vulnerable to poverty. On the other hand, a rural household whose members are either better educated or trained laborers would statistically be less poor. The growth-redistribution decomposition reveals that for all the three FGT indexes in Hubei province, income growth contributed much to the alleviation of poverty, while the redistribution or inequality effects counteracted the growth effects and worsened poverty. The poverty incidence decomposition results reveal that about one third of the growth effects had been counteracted by the redistribution effects. This implies that future anti-poverty programs should pay more attention to solving the inequality problem in China. Poverty dominance analysis also helps us better understand the poverty situation. It reveals that rural poverty in Inner Mongolia is more severe than that in Hubei, and that poverty incidence in Hubei has lessened from 1997 to 2003, which are the same findings as those drawn from deriving poverty lines.Rural Poverty Line, Poverty Determinants, Growth Redistribution Decomposition, Poverty Dominance, China
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
How to boost speech pre-training with textual data is an unsolved problem due
to the fact that speech and text are very different modalities with distinct
characteristics. In this paper, we propose a cross-modal Speech and Language
Model (SpeechLM) to explicitly align speech and text pre-training with a
pre-defined unified discrete representation. Specifically, we introduce two
alternative discrete tokenizers to bridge the speech and text modalities,
including phoneme-unit and hidden-unit tokenizers, which can be trained using a
small amount of paired speech-text data. Based on the trained tokenizers, we
convert the unlabeled speech and text data into tokens of phoneme units or
hidden units. The pre-training objective is designed to unify the speech and
the text into the same discrete semantic space with a unified Transformer
network. Leveraging only 10K text sentences, our SpeechLM gets a 16\% relative
WER reduction over the best base model performance (from 6.8 to 5.7) on the
public LibriSpeech ASR benchmark. Moreover, SpeechLM with fewer parameters even
outperforms previous SOTA models on CoVoST-2 speech translation tasks. We also
evaluate our SpeechLM on various spoken language processing tasks under the
universal representation evaluation framework SUPERB, demonstrating significant
improvements on content-related tasks. Our code and models are available at
https://aka.ms/SpeechLM.Comment: 14 page
On decoder-only architecture for speech-to-text and large language model integration
Large language models (LLMs) have achieved remarkable success in the field of
natural language processing, enabling better human-computer interaction using
natural language. However, the seamless integration of speech signals into LLMs
has not been explored well. The "decoder-only" architecture has also not been
well studied for speech processing tasks. In this research, we introduce
Speech-LLaMA, a novel approach that effectively incorporates acoustic
information into text-based large language models. Our method leverages
Connectionist Temporal Classification and a simple audio encoder to map the
compressed acoustic features to the continuous semantic space of the LLM. In
addition, we further probe the decoder-only architecture for speech-to-text
tasks by training a smaller scale randomly initialized speech-LLaMA model from
speech-text paired data alone. We conduct experiments on multilingual
speech-to-text translation tasks and demonstrate a significant improvement over
strong baselines, highlighting the potential advantages of decoder-only models
for speech-to-text conversion
- …