27 research outputs found

    Neural Machine Translation with Word Predictions

    Full text link
    In the encoder-decoder architecture for neural machine translation (NMT), the hidden states of the recurrent structures in the encoder and decoder carry the crucial information about the sentence.These vectors are generated by parameters which are updated by back-propagation of translation errors through time. We argue that propagating errors through the end-to-end recurrent structures are not a direct way of control the hidden vectors. In this paper, we propose to use word predictions as a mechanism for direct supervision. More specifically, we require these vectors to be able to predict the vocabulary in target sentence. Our simple mechanism ensures better representations in the encoder and decoder without using any extra data or annotation. It is also helpful in reducing the target side vocabulary and improving the decoding efficiency. Experiments on Chinese-English and German-English machine translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201

    Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

    Full text link
    Pre-training and fine-tuning have achieved great success in the natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an APT framework for acquiring knowledge from the pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts

    Histone/Protein Deacetylase 11 Targeting Promotes Foxp3+ Treg Function.

    Get PDF
    Current interest in Foxp3+ T-regulatory (Treg) cells as therapeutic targets in transplantation is largely focused on their harvesting pre-transplant, expansion and infusion post-transplantation. An alternate strategy of pharmacologic modulation of Treg function using histone/protein deacetylase inhibitors (HDACi) may allow more titratable and longer-term dosing. However, the effects of broadly acting HDACi vary, such that HDAC isoform-selective targeting is likely required. We report data from mice with constitutive or conditional deletion of HDAC11 within Foxp3+ Treg cells, and their use, along with small molecule HDAC11 inhibitors, in allograft models. Global HDAC11 deletion had no effect on health or development, and compared to WT controls, Foxp3+ Tregs lacking HDAC11 showed increased suppressive function, and increased expression of Foxp3 and TGF-β. Likewise, compared to WT recipients, conditional deletion of HDAC11 within Tregs led to long-term survival of fully MHC-mismatched cardiac allografts, and prevented development of transplant arteriosclerosis in an MHC class II-mismatched allograft model. The translational significance of HDAC11 targeting was shown by the ability of an HDAC11i to promote long-term allograft allografts in fully MHC-disparate strains. These data are powerful stimuli for the further development and testing of HDAC11-selective pharmacologic inhibitors, and may ultimately provide new therapies for transplantation and autoimmune diseases

    Secrets of RLHF in Large Language Models Part I: PPO

    Full text link
    Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO code

    25th annual computational neuroscience meeting: CNS-2016

    Get PDF
    The same neuron may play different functional roles in the neural circuits to which it belongs. For example, neurons in the Tritonia pedal ganglia may participate in variable phases of the swim motor rhythms [1]. While such neuronal functional variability is likely to play a major role the delivery of the functionality of neural systems, it is difficult to study it in most nervous systems. We work on the pyloric rhythm network of the crustacean stomatogastric ganglion (STG) [2]. Typically network models of the STG treat neurons of the same functional type as a single model neuron (e.g. PD neurons), assuming the same conductance parameters for these neurons and implying their synchronous firing [3, 4]. However, simultaneous recording of PD neurons shows differences between the timings of spikes of these neurons. This may indicate functional variability of these neurons. Here we modelled separately the two PD neurons of the STG in a multi-neuron model of the pyloric network. Our neuron models comply with known correlations between conductance parameters of ionic currents. Our results reproduce the experimental finding of increasing spike time distance between spikes originating from the two model PD neurons during their synchronised burst phase. The PD neuron with the larger calcium conductance generates its spikes before the other PD neuron. Larger potassium conductance values in the follower neuron imply longer delays between spikes, see Fig. 17.Neuromodulators change the conductance parameters of neurons and maintain the ratios of these parameters [5]. Our results show that such changes may shift the individual contribution of two PD neurons to the PD-phase of the pyloric rhythm altering their functionality within this rhythm. Our work paves the way towards an accessible experimental and computational framework for the analysis of the mechanisms and impact of functional variability of neurons within the neural circuits to which they belong

    26th Annual Computational Neuroscience Meeting (CNS*2017): Part 3 - Meeting Abstracts - Antwerp, Belgium. 15–20 July 2017

    Get PDF
    This work was produced as part of the activities of FAPESP Research,\ud Disseminations and Innovation Center for Neuromathematics (grant\ud 2013/07699-0, S. Paulo Research Foundation). NLK is supported by a\ud FAPESP postdoctoral fellowship (grant 2016/03855-5). ACR is partially\ud supported by a CNPq fellowship (grant 306251/2014-0)

    Understanding the Effect of an E-Hailing App Subsidy War on Taxicab Operation Zones

    No full text
    Understanding taxicab operation behaviors under various management or market policies (i.e., subsidies) is critical to making informed operating decisions for e-hailing companies and for government surveillance. This paper investigates the change of taxicab operation zones in context of an e-hailing app subsidy war in China, which is an important perspective that reflects changes in taxicab behavior, such as how the operation zones of taxicabs under the e-hailing app subsidy war change and how this change affects their trip distance and cruising time. To investigate this issue, this paper utilizes three indexes to elucidate the change of taxicab operation zones, namely, the repetition ratio of operation zone pairs, the area, and the degree of dispersion in the spatial distribution. A case study using taxicab trajectories during all of the important periods of the e-hailing app subsidy war in Shenzhen, China, was conducted and produced several valuable findings; for example, with respect to taxicabs as a whole, the proportion of habitual operation zone pairs among operation zone pairs in neighboring periods is relatively stable under any subsidy policy, and changes in the operation zones have little effect on changes in the average daily trip distance and average daily cruising time. Four groups of taxicabs divided according to initial change patterns in the operation zones present different change patterns during the subsidy war. By comparing these changes before and after the subsidy war, this paper finds that the subsidy war influences the taxicabs in groups I and II, while it has little influence on the taxicabs in groups III and IV, although all groups were affected during the subsidy war. For the taxicab groups in the period with the highest subsidy, the average daily trip distance and average daily cruising time decreased, whereas, in other periods, they presented different patterns
    corecore