1,597 research outputs found
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
Transformers are powerful for sequence modeling. Nearly all state-of-the-art
language models and pre-trained language models are based on the Transformer
architecture. However, it distinguishes sequential tokens only with the token
position index. We hypothesize that better contextual representations can be
generated from the Transformer with richer positional information. To verify
this, we propose a segment-aware Transformer (Segatron), by replacing the
original token position encoding with a combined position encoding of
paragraph, sentence, and token. We first introduce the segment-aware mechanism
to Transformer-XL, which is a popular Transformer-based language model with
memory extension and relative position encoding. We find that our method can
further improve the Transformer-XL base model and large model, achieving 17.1
perplexity on the WikiText-103 dataset. We further investigate the pre-training
masked language modeling task with Segatron. Experimental results show that
BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla
Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence
representation learning.Comment: Accepted by AAAI 202
Distributional Drift Adaptation with Temporal Conditional Variational Autoencoder for Multivariate Time Series Forecasting
Due to the nonstationary nature, the distribution of real-world multivariate
time series (MTS) changes over time, which is known as distribution drift. Most
existing MTS forecasting models greatly suffer from distribution drift and
degrade the forecasting performance over time. Existing methods address
distribution drift via adapting to the latest arrived data or self-correcting
per the meta knowledge derived from future data. Despite their great success in
MTS forecasting, these methods hardly capture the intrinsic distribution
changes, especially from a distributional perspective. Accordingly, we propose
a novel framework temporal conditional variational autoencoder (TCVAE) to model
the dynamic distributional dependencies over time between historical
observations and future data in MTSs and infer the dependencies as a temporal
conditional distribution to leverage latent variables. Specifically, a novel
temporal Hawkes attention mechanism represents temporal factors subsequently
fed into feed-forward networks to estimate the prior Gaussian distribution of
latent variables. The representation of temporal factors further dynamically
adjusts the structures of Transformer-based encoder and decoder to distribution
changes by leveraging a gated attention mechanism. Moreover, we introduce
conditional continuous normalization flow to transform the prior Gaussian to a
complex and form-free distribution to facilitate flexible inference of the
temporal conditional distribution. Extensive experiments conducted on six
real-world MTS datasets demonstrate the TCVAE's superior robustness and
effectiveness over the state-of-the-art MTS forecasting baselines. We further
illustrate the TCVAE applicability through multifaceted case studies and
visualization in real-world scenarios.Comment: 13 pages, 6 figures, submitted to IEEE Transactions on Neural
Networks and Learning Systems (TNNLS
Abnormal magnetoresistance behavior in Nb thin film with rectangular antidot lattice
Abnormal magnetoresistance behavior is found in superconducting Nb films
perforated with rectangular arrays of antidots (holes). Generally
magnetoresistance were always found to increase with increasing magnetic field.
Here we observed a reversal of this behavior for particular in low temperature
or current density. This phenomenon is due to a strong 'caging effect' which
interstitial vortices are strongly trapped among pinned multivortices.Comment: 4 pages, 2 figure
Numerical simulation of thermal stratification in Lake Qiandaohu using an improved WRF-Lake model
Lake thermal stratification is important for regulating lake environments and ecosystems and is sensitive to climate change and human activity. However, numerical simulation of coupled hydrodynamics and heat transfer processes in deep lakes using one-dimensional lake models remains challenging because of the insufficient representation of key parameters. In this study, Lake Qiandaohu, a deep and warm monomictic reservoir, was used as an example to investigate thermal stratification via an improved parameterization scheme of the Weather Research and Forecast (WRF)-Lake. A comparison with in situ observations demonstrated that the default WRF-Lake model was able to simulate well the seasonal variation of the lake thermal structure. However, the simulations exhibited cold biases in lake surface water temperature (LSWT) throughout the year while generating weaker stratification in summer, thereby leading to an earlier cooling period in autumn. With an improved parameterization (i.e., via determination of initial lake water temperature profiles, light extinction coefficients, eddy diffusion coefficients and surface roughness lengths), the modified WRF-Lake model was able to better simulate LSWT and thermal stratification. Critically, employing realistic initial conditions for lake water temperature is essential for producing realistic hypolimnetic water temperatures. The use of time-dependent light extinction coefficients resulted in a deep thermocline and warm LSWT. Enlarging eddy diffusivity led to stronger mixing in summer and further influenced autumn cooling. The parameterized surface roughness lengths mitigated the excessive turbulent heat loss at the lake surface, improved the model performance in simulating LSWT, and generated a warm mixed layer. This study provides guidance on model parameterization for simulating the thermal structure of deep lakes and advances our understanding of the strength and revolution of lake thermal stratification under seasonal changes
Uplift, Climate and Biotic Changes at the Eocene-Oligocene Transition in Southeast Tibet
The uplift history of southeastern Tibet is crucial to understanding processes driving the tectonic evolution of the Tibetan Plateau and surrounding areas. Underpinning existing palaeoaltimetric studies has been regional mapping based in large part on biostratigraphy that assumes a Neogene modernisation of the highly diverse, but threatened, Asian biota. Here, with new radiometric dating and newly-collected plant fossil archives, we quantify the surface height of part of Tibet’s southeastern margin of Tibet in the latest Eocene (~34 Ma) to be ~3 km and rising, possibly attaining its present elevation (3.9 km) in the early Oligocene. We also find that the Eocene-Oligocene transition in southeastern Tibet witnessed leaf size diminution and a floral composition change from sub-tropical/warm temperate to cool temperate, likely reflective of both uplift and secular climate change, and that by the latest Eocene floral modernization on Tibet had already taken place implying modernization was deeply-rooted in the Paleogene
- …