275 research outputs found

    Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation

    Full text link
    For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e.g. 25 languages from commoncrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e.g. English-German, making there exists the cross-lingual data discrepancy, namely \textit{domain discrepancy}, and cross-lingual learning objective discrepancy, namely \textit{task discrepancy}, between the pretrain and finetune stages. To bridge the above cross-lingual domain and task gaps, we extend the vanilla pretrain-finetune pipeline with extra code-switching restore task. Specifically, the first stage employs the self-supervised code-switching restore task as a pretext task, allowing the multilingual Seq2Seq PLM to acquire some in-domain alignment information. And for the second stage, we continuously fine-tune the model on labeled data normally. Experiments on a variety of cross-lingual NLG tasks, including 12 bilingual translation tasks, 36 zero-shot translation tasks, and cross-lingual summarization tasks show our model outperforms the strong baseline mBART consistently. Comprehensive analyses indicate our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost

    Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search

    Full text link
    Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates new text-image pair samples in the proximity space of original ones. Then it combines approximate text generation and feature-level mixup during training to further strengthen the data diversity. The PDG module can largely guarantee the reasonability of the generated samples that are directly used for training without any human inspection for noise rejection. It improves the performance of our model significantly, providing a feasible solution to the data insufficiency problem faced by such fine-grained visual-linguistic tasks. Extensive experiments on two popular datasets of the TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%, 4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES. The codes will be available at https://github.com/HCPLab-SYSU/PersonSearch-CTLGComment: Accepted by IEEE T-CSV

    Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

    Full text link
    Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., for English and for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost

    Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

    Full text link
    This paper presents a controllable text-to-video (T2V) diffusion model, named Video-ControlNet, that generates videos conditioned on a sequence of control signals, such as edge or depth maps. Video-ControlNet is built on a pre-trained conditional text-to-image (T2I) diffusion model by incorporating a spatial-temporal self-attention mechanism and trainable temporal layers for efficient cross-frame modeling. A first-frame conditioning strategy is proposed to facilitate the model to generate videos transferred from the image domain as well as arbitrary-length videos in an auto-regressive manner. Moreover, Video-ControlNet employs a novel residual-based noise initialization strategy to introduce motion prior from an input video, producing more coherent videos. With the proposed architecture and strategies, Video-ControlNet can achieve resource-efficient convergence and generate superior quality and consistent videos with fine-grained control. Extensive experiments demonstrate its success in various video generative tasks such as video editing and video style transfer, outperforming previous methods in terms of consistency and quality. Project Page: https://controlavideo.github.io

    MSRL: Distributed Reinforcement Learning with Dataflow Fragments

    Get PDF
    Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs

    An analysis of China's grain production: Looking back and looking forward

    Get PDF
    Ensuring food security is the foundation of economic development and social stability. China is historically a country that is dominated by agriculture. In the past 60 years, China's total grain output increased by fivefold, from 113 million tons (MT) in 1949 to 571 MT in 2011, a statistic which provides inspiration to producers in other parts of the world. Grain production per capita doubled, from 209 to 425 kg during the same time period. At the national scale, China has succeeded in maintaining a basic self-sufficiency for grain for the past three decades. However, with the increasing population pressure and a growing appetite for animal products, China will need 776 MT grain by 2030 to feed its own people, a net increase of 35.9% from its best year on record. China's drive for future food security is challenged by problems such as low efficiency of resource use and resource limitation, diminishing return in yield response, competition for nonagricultural land uses, and environmental degradation. In this article, we analyze historical, temporal, and spatial variation in total grain production as well as the overall developing trends of current and future grain production, and discussed relevant options to overcome production constraints and further promote agricultural production.</p

    An analysis of microbiota-targeted therapies in patients with avian influenza virus subtype H7N9 infection

    Get PDF
    BACKGROUND: Selective prophylactic decontamination of the digestive tract is a strategy for the prevention of secondary nosocomial infection in patients with avian influenza virus subtype H7N9 infection. Our aim was to summarize the effectiveness of these therapies in re-establishing a stable and diverse microbial community, and reducing secondary infections. METHODS: Comprehensive therapies were dependent on the individual clinical situation of subjects, and were divided into antiviral treatment, microbiota-targeted therapies, including pro- or pre-biotics and antibiotic usage, and immunotherapy. Quantitative polymerase chain reaction and denaturing gradient gel electrophoresis (DGGE) were used for real-time monitoring of the predominant intestinal microbiome during treatment. Clinical information about secondary infection was confirmed by analyzing pathogens isolated from clinical specimens. RESULTS: Different antibiotics had similar effects on the gut microbiome, with a marked decrease and slow recovery of the Bifidobacterium population. Interestingly, most fecal microbial DGGE profiles showed the relative stability of communities under the continual suppression of the same antibiotics, and significant changes when new antibiotics were introduced. Moreover, we found no marked increase in C-reactive protein, and no cases of bacteremia or pneumonia, caused by probiotic use in the patients, which confirmed that the probiotics used in this study were safe for use in patients with H7N9 infection. Approximately 72% of those who subsequently suffered exogenous respiratory infection by Candida species or multidrug-resistant Acinetobacter baumannii and Klebsiella pneumoniae were older than 60 years. The combination of probiotics and prebiotics with antibiotics seemed to fail in these patients. CONCLUSIONS: Elderly patients infected with the influenza A (H7N9) virus are considered a high-risk group for developing secondary bacterial infection. Microbiota restoration treatment reduced the incidence of enterogenous secondary infection, but not exogenous respiratory infection. The prophylactic effects of microbiota restoration strategies for secondary infection were unsatisfactory in elderly and critically ill patients
    corecore