Search CORE

75 research outputs found

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

Author: Liu Shujie
Ma Shuai
Ren Shuo
Zhang Zhirui
Zhou Ming
Publication venue
Publication date: 13/01/2019
Field of study

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.Comment: To be presented at AAAI 2019; 9 pages, 4 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Measuring Rural Poverty in China: a Case Study Approach

Author: Juan Liu
Shujie Yao
Wenjuan Ren
Xian Xin
Xiumei Liu
Xiuqing Wang
Publication venue
Publication date
Field of study

This paper measures rural poverty in Hubei Province and Inner Mongolia in China. The poverty lines we derived by Ravallion's method differ from the official Chinese poverty lines. The official pan-country poverty line underestimates rural poverty in Hubei Province and overestimates rural poverty in Inner Mongolia. Poverty determinants are estimated by Logit as well as Probit models. The study notes that factors such as living in a mountainous area, lack of better irrigation conditions, a large family size, few fixed assets, few land owned and sole dependence on agriculture as a livelihood source would make a rural household more vulnerable to poverty. On the other hand, a rural household whose members are either better educated or trained laborers would statistically be less poor. The growth-redistribution decomposition reveals that for all the three FGT indexes in Hubei province, income growth contributed much to the alleviation of poverty, while the redistribution or inequality effects counteracted the growth effects and worsened poverty. The poverty incidence decomposition results reveal that about one third of the growth effects had been counteracted by the redistribution effects. This implies that future anti-poverty programs should pay more attention to solving the inequality problem in China. Poverty dominance analysis also helps us better understand the poverty situation. It reveals that rural poverty in Inner Mongolia is more severe than that in Hubei, and that poverty incidence in Hubei has lessened from 1997 to 2003, which are the same findings as those drawn from deriving poverty lines.Rural Poverty Line, Poverty Determinants, Growth Redistribution Decomposition, Poverty Dominance, China

Research Papers in Economics

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Author: Chen Sanyuan
Dai Lirong
Gong Xun
Li Jinyu
Liu Shujie
Ren Shuo
Wei Furu
Wu Yu
Yao Zhuoyuan
Zhang Ziqiang
Zhou Long
Publication venue
Publication date: 30/09/2022
Field of study

How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. Leveraging only 10K text sentences, our SpeechLM gets a 16\% relative WER reduction over the best base model performance (from 6.8 to 5.7) on the public LibriSpeech ASR benchmark. Moreover, SpeechLM with fewer parameters even outperforms previous SOTA models on CoVoST-2 speech translation tasks. We also evaluate our SpeechLM on various spoken language processing tasks under the universal representation evaluation framework SUPERB, demonstrating significant improvements on content-related tasks. Our code and models are available at https://aka.ms/SpeechLM.Comment: 14 page

arXiv.org e-Print Archive

On decoder-only architecture for speech-to-text and large language model integration

Author: Chen Zhuo
Gaur Yashesh
Li Jinyu
Liu Linquan
Liu Shujie
Ren Bo
Wang Tianrui
Wu Jian
Wu Yu
Zhou Long
Zhu Yimeng
Publication venue
Publication date: 08/07/2023
Field of study

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion

arXiv.org e-Print Archive