288 research outputs found
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation
For multilingual sequence-to-sequence pretrained language models
(multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task
is trained on a wide range of monolingual languages, e.g. 25 languages from
commoncrawl, while the downstream cross-lingual tasks generally progress on a
bilingual language subset, e.g. English-German, making there exists the
cross-lingual data discrepancy, namely \textit{domain discrepancy}, and
cross-lingual learning objective discrepancy, namely \textit{task discrepancy},
between the pretrain and finetune stages. To bridge the above cross-lingual
domain and task gaps, we extend the vanilla pretrain-finetune pipeline with
extra code-switching restore task. Specifically, the first stage employs the
self-supervised code-switching restore task as a pretext task, allowing the
multilingual Seq2Seq PLM to acquire some in-domain alignment information. And
for the second stage, we continuously fine-tune the model on labeled data
normally. Experiments on a variety of cross-lingual NLG tasks, including 12
bilingual translation tasks, 36 zero-shot translation tasks, and cross-lingual
summarization tasks show our model outperforms the strong baseline mBART
consistently. Comprehensive analyses indicate our approach could narrow the
cross-lingual sentence representation distance and improve low-frequency word
translation with trivial computational cost
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Given a descriptive text query, text-based person search (TBPS) aims to
retrieve the best-matched target person from an image gallery. Such a
cross-modal retrieval task is quite challenging due to significant modality
gap, fine-grained differences and insufficiency of annotated data. To better
align the two modalities, most existing works focus on introducing
sophisticated network structures and auxiliary tasks, which are complex and
hard to implement. In this paper, we propose a simple yet effective dual
Transformer model for text-based person search. By exploiting a hardness-aware
contrastive learning strategy, our model achieves state-of-the-art performance
without any special design for local feature alignment or side information.
Moreover, we propose a proximity data generation (PDG) module to automatically
produce more diverse data for cross-modal training. The PDG module first
introduces an automatic generation algorithm based on a text-to-image diffusion
model, which generates new text-image pair samples in the proximity space of
original ones. Then it combines approximate text generation and feature-level
mixup during training to further strengthen the data diversity. The PDG module
can largely guarantee the reasonability of the generated samples that are
directly used for training without any human inspection for noise rejection. It
improves the performance of our model significantly, providing a feasible
solution to the data insufficiency problem faced by such fine-grained
visual-linguistic tasks. Extensive experiments on two popular datasets of the
TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach
outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%,
4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES. The codes will be
available at https://github.com/HCPLab-SYSU/PersonSearch-CTLGComment: Accepted by IEEE T-CSV
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Zero-shot translation (ZST), which is generally based on a multilingual
neural machine translation model, aims to translate between unseen language
pairs in training data. The common practice to guide the zero-shot language
mapping during inference is to deliberately insert the source and target
language IDs, e.g., for English and for German. Recent studies have
shown that language IDs sometimes fail to navigate the ZST task, making them
suffer from the off-target problem (non-target language words exist in the
generated translation) and, therefore, difficult to apply the current
multilingual translation model to a broad range of zero-shot language
scenarios. To understand when and why the navigation capabilities of language
IDs are weakened, we compare two extreme decoder input cases in the ZST
directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively
visualizing the contextual word representations (CWRs) of these cases with
teacher forcing, we show that 1) the CWRs of different languages are
effectively distributed in separate regions when the sentence and ID are
matched (ON setting), and 2) if the sentence and ID are unmatched (OFF
setting), the CWRs of different languages are chaotically distributed. Our
analyses suggest that although they work well in ideal ON settings, language
IDs become fragile and lose their navigation ability when faced with off-target
tokens, which commonly exist during inference but are rare in training
scenarios. In response, we employ unlikelihood tuning on the negative (OFF)
samples to minimize their probability such that the language IDs can
discriminate between the on- and off-target tokens during training. Experiments
spanning 40 ZST directions show that our method reduces the off-target ratio by
-48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3%
tuning cost
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
This paper presents a controllable text-to-video (T2V) diffusion model, named
Video-ControlNet, that generates videos conditioned on a sequence of control
signals, such as edge or depth maps. Video-ControlNet is built on a pre-trained
conditional text-to-image (T2I) diffusion model by incorporating a
spatial-temporal self-attention mechanism and trainable temporal layers for
efficient cross-frame modeling. A first-frame conditioning strategy is proposed
to facilitate the model to generate videos transferred from the image domain as
well as arbitrary-length videos in an auto-regressive manner. Moreover,
Video-ControlNet employs a novel residual-based noise initialization strategy
to introduce motion prior from an input video, producing more coherent videos.
With the proposed architecture and strategies, Video-ControlNet can achieve
resource-efficient convergence and generate superior quality and consistent
videos with fine-grained control. Extensive experiments demonstrate its success
in various video generative tasks such as video editing and video style
transfer, outperforming previous methods in terms of consistency and quality.
Project Page: https://controlavideo.github.io
MSRL: Distributed Reinforcement Learning with Dataflow Fragments
Reinforcement learning (RL) trains many agents, which is resource-intensive
and must scale to large GPU clusters. Different RL training algorithms offer
different opportunities for distributing and parallelising the computation.
Yet, current distributed RL systems tie the definition of RL algorithms to
their distributed execution: they hard-code particular distribution strategies
and only accelerate specific parts of the computation (e.g. policy network
updates) on GPU workers. Fundamentally, current systems lack abstractions that
decouple RL algorithms from their execution.
We describe MindSpore Reinforcement Learning (MSRL), a distributed RL
training system that supports distribution policies that govern how RL training
computation is parallelised and distributed on cluster resources, without
requiring changes to the algorithm implementation. MSRL introduces the new
abstraction of a fragmented dataflow graph, which maps Python functions from an
RL algorithm's training loop to parallel computational fragments. Fragments are
executed on different devices by translating them to low-level dataflow
representations, e.g. computational graphs as supported by deep learning
engines, CUDA implementations or multi-threaded CPU processes. We show that
MSRL subsumes the distribution strategies of existing systems, while scaling RL
training to 64 GPUs
An analysis of China's grain production: Looking back and looking forward
Ensuring food security is the foundation of economic development and social stability. China is historically a country that is dominated by agriculture. In the past 60 years, China's total grain output increased by fivefold, from 113 million tons (MT) in 1949 to 571 MT in 2011, a statistic which provides inspiration to producers in other parts of the world. Grain production per capita doubled, from 209 to 425 kg during the same time period. At the national scale, China has succeeded in maintaining a basic self-sufficiency for grain for the past three decades. However, with the increasing population pressure and a growing appetite for animal products, China will need 776 MT grain by 2030 to feed its own people, a net increase of 35.9% from its best year on record. China's drive for future food security is challenged by problems such as low efficiency of resource use and resource limitation, diminishing return in yield response, competition for nonagricultural land uses, and environmental degradation. In this article, we analyze historical, temporal, and spatial variation in total grain production as well as the overall developing trends of current and future grain production, and discussed relevant options to overcome production constraints and further promote agricultural production.</p
An analysis of microbiota-targeted therapies in patients with avian influenza virus subtype H7N9 infection
BACKGROUND: Selective prophylactic decontamination of the digestive tract is a strategy for the prevention of secondary nosocomial infection in patients with avian influenza virus subtype H7N9 infection. Our aim was to summarize the effectiveness of these therapies in re-establishing a stable and diverse microbial community, and reducing secondary infections. METHODS: Comprehensive therapies were dependent on the individual clinical situation of subjects, and were divided into antiviral treatment, microbiota-targeted therapies, including pro- or pre-biotics and antibiotic usage, and immunotherapy. Quantitative polymerase chain reaction and denaturing gradient gel electrophoresis (DGGE) were used for real-time monitoring of the predominant intestinal microbiome during treatment. Clinical information about secondary infection was confirmed by analyzing pathogens isolated from clinical specimens. RESULTS: Different antibiotics had similar effects on the gut microbiome, with a marked decrease and slow recovery of the Bifidobacterium population. Interestingly, most fecal microbial DGGE profiles showed the relative stability of communities under the continual suppression of the same antibiotics, and significant changes when new antibiotics were introduced. Moreover, we found no marked increase in C-reactive protein, and no cases of bacteremia or pneumonia, caused by probiotic use in the patients, which confirmed that the probiotics used in this study were safe for use in patients with H7N9 infection. Approximately 72% of those who subsequently suffered exogenous respiratory infection by Candida species or multidrug-resistant Acinetobacter baumannii and Klebsiella pneumoniae were older than 60 years. The combination of probiotics and prebiotics with antibiotics seemed to fail in these patients. CONCLUSIONS: Elderly patients infected with the influenza A (H7N9) virus are considered a high-risk group for developing secondary bacterial infection. Microbiota restoration treatment reduced the incidence of enterogenous secondary infection, but not exogenous respiratory infection. The prophylactic effects of microbiota restoration strategies for secondary infection were unsatisfactory in elderly and critically ill patients
- …