2,574 research outputs found

    Universal Language Model Fine-tuning for Text Classification

    Full text link
    Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.Comment: ACL 2018, fixed denominator in Equation 3, line

    Phase Transition in Dimer Liquids

    Full text link
    We study the phase transition in a system composed of dimers interacting with each other via a nearest-neighbor (NN) exchange JJ and competing interactions taken from a truncated dipolar coupling. Each dimer occupies a link between two nearest sites of a simple cubic lattice. We suppose that dimers are self-avoiding and can have only three orientations which coincide with the xx, yy or zz direction. The interaction JJ is attractive if the two dimers are parallel with each other at the NN distance, zero otherwise. The truncated dipolar interaction is characterized by two parameters: its amplitude DD and the cutoff distance rcr_c. Using the steepest-descent method, we determine the ground-state (GS) configuration as functions of DD and rcr_c. We then use Monte Carlo simulations to investigate the nature of the low-temperature phase and to determine characteristics of the phase transition from the ordered phase to the disordered phase at high temperatures at a given dimer concentration. We show that as the temperature increases, dimers remain in the compact state and the transition from the low-TT compact phase to the disordered phase where dimers occupy the whole space is of second order when DD is small, but it becomes of first order for large enough DD, for both polarized and non polarized dimers. This transition has a resemblance with the unfolding polymer transition. The effect of rcr_c is discussed

    Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

    Full text link
    User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.Comment: To appear at AAAI 201

    Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network

    Full text link
    In this paper, we present a two stage model for multi-hop question answering. The first stage is a hierarchical graph network, which is used to reason over multi-hop question and is capable to capture different levels of granularity using the nature structure(i.e., paragraphs, questions, sentences and entities) of documents. The reasoning process is convert to node classify task(i.e., paragraph nodes and sentences nodes). The second stage is a language model fine-tuning task. In a word, stage one use graph neural network to select and concatenate support sentences as one paragraph, and stage two find the answer span in language model fine-tuning paradigm.Comment: the experience result is not as good as I excep

    Vortical and Wave Modes in 3D Rotating Stratified Flows: Random Large Scale Forcing

    Full text link
    Utilizing an eigenfunction decomposition, we study the growth and spectra of energy in the vortical and wave modes of a 3D rotating stratified fluid as a function of ϵ=f/N\epsilon = f/N. Working in regimes characterized by moderate Burger numbers, i.e. Bu=1/ϵ2<1Bu = 1/\epsilon^2 < 1 or Bu1Bu \ge 1, our results indicate profound change in the character of vortical and wave mode interactions with respect to Bu=1Bu = 1. As with the reference state of ϵ=1\epsilon=1, for ϵ<1\epsilon < 1 the wave mode energy saturates quite quickly and the ensuing forward cascade continues to act as an efficient means of dissipating ageostrophic energy. Further, these saturated spectra steepen as ϵ\epsilon decreases: we see a shift from k1k^{-1} to k5/3k^{-5/3} scaling for kf<k<kdk_f < k < k_d (where kfk_f and kdk_d are the forcing and dissipation scales, respectively). On the other hand, when ϵ>1\epsilon > 1 the wave mode energy never saturates and comes to dominate the total energy in the system. In fact, in a sense the wave modes behave in an asymmetric manner about ϵ=1\epsilon = 1. With regard to the vortical modes, for ϵ1\epsilon \le 1, the signatures of 3D quasigeostrophy are clearly evident. Specifically, we see a k3k^{-3} scaling for kf<k<kdk_f < k < k_d and, in accord with an inverse transfer of energy, the vortical mode energy never saturates but rather increases for all k<kfk < k_f. In contrast, for ϵ>1\epsilon > 1 and increasing, the vortical modes contain a progressively smaller fraction of the total energy indicating that the 3D quasigeostrophic subsystem plays an energetically smaller role in the overall dynamics.Comment: 18 pages, 6 figs. (abbreviated abstract
    corecore