2,574 research outputs found
Universal Language Model Fine-tuning for Text Classification
Inductive transfer learning has greatly impacted computer vision, but
existing approaches in NLP still require task-specific modifications and
training from scratch. We propose Universal Language Model Fine-tuning
(ULMFiT), an effective transfer learning method that can be applied to any task
in NLP, and introduce techniques that are key for fine-tuning a language model.
Our method significantly outperforms the state-of-the-art on six text
classification tasks, reducing the error by 18-24% on the majority of datasets.
Furthermore, with only 100 labeled examples, it matches the performance of
training from scratch on 100x more data. We open-source our pretrained models
and code.Comment: ACL 2018, fixed denominator in Equation 3, line
Phase Transition in Dimer Liquids
We study the phase transition in a system composed of dimers interacting with
each other via a nearest-neighbor (NN) exchange and competing interactions
taken from a truncated dipolar coupling. Each dimer occupies a link between two
nearest sites of a simple cubic lattice. We suppose that dimers are
self-avoiding and can have only three orientations which coincide with the ,
or direction. The interaction is attractive if the two dimers are
parallel with each other at the NN distance, zero otherwise. The truncated
dipolar interaction is characterized by two parameters: its amplitude and
the cutoff distance . Using the steepest-descent method, we determine the
ground-state (GS) configuration as functions of and . We then use
Monte Carlo simulations to investigate the nature of the low-temperature phase
and to determine characteristics of the phase transition from the ordered phase
to the disordered phase at high temperatures at a given dimer concentration. We
show that as the temperature increases, dimers remain in the compact state and
the transition from the low- compact phase to the disordered phase where
dimers occupy the whole space is of second order when is small, but it
becomes of first order for large enough , for both polarized and non
polarized dimers. This transition has a resemblance with the unfolding polymer
transition. The effect of is discussed
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
User interaction with voice-powered agents generates large amounts of
unlabeled utterances. In this paper, we explore techniques to efficiently
transfer the knowledge from these unlabeled utterances to improve model
performance on Spoken Language Understanding (SLU) tasks. We use Embeddings
from Language Model (ELMo) to take advantage of unlabeled data by learning
contextualized word representations. Additionally, we propose ELMo-Light
(ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our
findings suggest unsupervised pre-training on a large corpora of unlabeled
utterances leads to significantly better SLU performance compared to training
from scratch and it can even outperform conventional supervised transfer.
Additionally, we show that the gains from unsupervised transfer techniques can
be further improved by supervised transfer. The improvements are more
pronounced in low resource settings and when using only 1000 labeled in-domain
samples, our techniques match the performance of training from scratch on
10-15x more labeled in-domain data.Comment: To appear at AAAI 201
Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
In this paper, we present a two stage model for multi-hop question answering.
The first stage is a hierarchical graph network, which is used to reason over
multi-hop question and is capable to capture different levels of granularity
using the nature structure(i.e., paragraphs, questions, sentences and entities)
of documents. The reasoning process is convert to node classify task(i.e.,
paragraph nodes and sentences nodes). The second stage is a language model
fine-tuning task. In a word, stage one use graph neural network to select and
concatenate support sentences as one paragraph, and stage two find the answer
span in language model fine-tuning paradigm.Comment: the experience result is not as good as I excep
Vortical and Wave Modes in 3D Rotating Stratified Flows: Random Large Scale Forcing
Utilizing an eigenfunction decomposition, we study the growth and spectra of
energy in the vortical and wave modes of a 3D rotating stratified fluid as a
function of . Working in regimes characterized by moderate
Burger numbers, i.e. or , our results
indicate profound change in the character of vortical and wave mode
interactions with respect to . As with the reference state of
, for the wave mode energy saturates quite quickly
and the ensuing forward cascade continues to act as an efficient means of
dissipating ageostrophic energy. Further, these saturated spectra steepen as
decreases: we see a shift from to scaling for
(where and are the forcing and dissipation scales,
respectively). On the other hand, when the wave mode energy
never saturates and comes to dominate the total energy in the system. In fact,
in a sense the wave modes behave in an asymmetric manner about .
With regard to the vortical modes, for , the signatures of 3D
quasigeostrophy are clearly evident. Specifically, we see a scaling
for and, in accord with an inverse transfer of energy, the
vortical mode energy never saturates but rather increases for all . In
contrast, for and increasing, the vortical modes contain a
progressively smaller fraction of the total energy indicating that the 3D
quasigeostrophic subsystem plays an energetically smaller role in the overall
dynamics.Comment: 18 pages, 6 figs. (abbreviated abstract
- …
