75 research outputs found

    Unsupervised Neural Machine Translation with SMT as Posterior Regularization

    Full text link
    Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.Comment: To be presented at AAAI 2019; 9 pages, 4 figure

    Measuring Rural Poverty in China: a Case Study Approach

    Get PDF
    This paper measures rural poverty in Hubei Province and Inner Mongolia in China. The poverty lines we derived by Ravallion's method differ from the official Chinese poverty lines. The official pan-country poverty line underestimates rural poverty in Hubei Province and overestimates rural poverty in Inner Mongolia. Poverty determinants are estimated by Logit as well as Probit models. The study notes that factors such as living in a mountainous area, lack of better irrigation conditions, a large family size, few fixed assets, few land owned and sole dependence on agriculture as a livelihood source would make a rural household more vulnerable to poverty. On the other hand, a rural household whose members are either better educated or trained laborers would statistically be less poor. The growth-redistribution decomposition reveals that for all the three FGT indexes in Hubei province, income growth contributed much to the alleviation of poverty, while the redistribution or inequality effects counteracted the growth effects and worsened poverty. The poverty incidence decomposition results reveal that about one third of the growth effects had been counteracted by the redistribution effects. This implies that future anti-poverty programs should pay more attention to solving the inequality problem in China. Poverty dominance analysis also helps us better understand the poverty situation. It reveals that rural poverty in Inner Mongolia is more severe than that in Hubei, and that poverty incidence in Hubei has lessened from 1997 to 2003, which are the same findings as those drawn from deriving poverty lines.Rural Poverty Line, Poverty Determinants, Growth Redistribution Decomposition, Poverty Dominance, China

    SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

    Full text link
    How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. Leveraging only 10K text sentences, our SpeechLM gets a 16\% relative WER reduction over the best base model performance (from 6.8 to 5.7) on the public LibriSpeech ASR benchmark. Moreover, SpeechLM with fewer parameters even outperforms previous SOTA models on CoVoST-2 speech translation tasks. We also evaluate our SpeechLM on various spoken language processing tasks under the universal representation evaluation framework SUPERB, demonstrating significant improvements on content-related tasks. Our code and models are available at https://aka.ms/SpeechLM.Comment: 14 page

    On decoder-only architecture for speech-to-text and large language model integration

    Full text link
    Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion
    • …
    corecore