938 research outputs found
Semi-supervised multiscale dual-encoding method for faulty traffic data detection
Inspired by the recent success of deep learning in multiscale information
encoding, we introduce a variational autoencoder (VAE) based semi-supervised
method for detection of faulty traffic data, which is cast as a classification
problem. Continuous wavelet transform (CWT) is applied to the time series of
traffic volume data to obtain rich features embodied in time-frequency
representation, followed by a twin of VAE models to separately encode normal
data and faulty data. The resulting multiscale dual encodings are concatenated
and fed to an attention-based classifier, consisting of a self-attention module
and a multilayer perceptron. For comparison, the proposed architecture is
evaluated against five different encoding schemes, including (1) VAE with only
normal data encoding, (2) VAE with only faulty data encoding, (3) VAE with both
normal and faulty data encodings, but without attention module in the
classifier, (4) siamese encoding, and (5) cross-vision transformer (CViT)
encoding. The first four encoding schemes adopted the same convolutional neural
network (CNN) architecture while the fifth encoding scheme follows the
transformer architecture of CViT. Our experiments show that the proposed
architecture with the dual encoding scheme, coupled with attention module,
outperforms other encoding schemes and results in classification accuracy of
96.4%, precision of 95.5%, and recall of 97.7%.Comment: 16 pages, 8 figure
SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training
In an era where symbolic mathematical equations are indispensable for
modeling complex natural phenomena, scientific inquiry often involves
collecting observations and translating them into mathematical expressions.
Recently, deep learning has emerged as a powerful tool for extracting insights
from data. However, existing models typically specialize in either numeric or
symbolic domains, and are usually trained in a supervised manner tailored to
specific tasks. This approach neglects the substantial benefits that could
arise from a task-agnostic unified understanding between symbolic equations and
their numeric counterparts. To bridge the gap, we introduce SNIP, a
Symbolic-Numeric Integrated Pre-training, which employs joint contrastive
learning between symbolic and numeric domains, enhancing their mutual
similarities in the pre-trained embeddings. By performing latent space
analysis, we observe that SNIP provides cross-domain insights into the
representations, revealing that symbolic supervision enhances the embeddings of
numeric data and vice versa. We evaluate SNIP across diverse tasks, including
symbolic-to-numeric mathematical property prediction and numeric-to-symbolic
equation discovery, commonly known as symbolic regression. Results show that
SNIP effectively transfers to various tasks, consistently outperforming fully
supervised baselines and competing strongly with established task-specific
methods, especially in few-shot learning scenarios where available data is
limited
Your Transformer May Not be as Powerful as You Expect
Relative Positional Encoding (RPE), which encodes the relative distance
between any pair of tokens, is one of the most successful modifications to the
original Transformer. As far as we know, theoretical understanding of the
RPE-based Transformers is largely unexplored. In this work, we mathematically
analyze the power of RPE-based Transformers regarding whether the model is
capable of approximating any continuous sequence-to-sequence functions. One may
naturally assume the answer is in the affirmative -- RPE-based Transformers are
universal function approximators. However, we present a negative result by
showing there exist continuous sequence-to-sequence functions that RPE-based
Transformers cannot approximate no matter how deep and wide the neural network
is. One key reason lies in that most RPEs are placed in the softmax attention
that always generates a right stochastic matrix. This restricts the network
from capturing positional information in the RPEs and limits its capacity. To
overcome the problem and make the model more powerful, we first present
sufficient conditions for RPE-based Transformers to achieve universal function
approximation. With the theoretical guidance, we develop a novel attention
module, called Universal RPE-based (URPE) Attention, which satisfies the
conditions. Therefore, the corresponding URPE-based Transformers become
universal function approximators. Extensive experiments covering typical
architectures and tasks demonstrate that our model is parameter-efficient and
can achieve superior performance to strong baselines in a wide range of
applications. The code will be made publicly available at
https://github.com/lsj2408/URPE.Comment: 22 pages; NeurIPS 2022, Camera Ready Versio
Dynamic Position Encoding for Transformers
Recurrent models have been dominating the field of neural machine translation
(NMT) for the past few years. Transformers \citep{vaswani2017attention}, have
radically changed it by proposing a novel architecture that relies on a
feed-forward backbone and self-attention mechanism. Although Transformers are
powerful, they could fail to properly encode sequential/positional information
due to their non-recurrent nature. To solve this problem, position embeddings
are defined exclusively for each time step to enrich word information. However,
such embeddings are fixed after training regardless of the task and the word
ordering system of the source or target language.
In this paper, we propose a novel architecture with new position embeddings
depending on the input text to address this shortcoming by taking the order of
target words into consideration. Instead of using predefined position
embeddings, our solution \textit{generates} new embeddings to refine each
word's position information. Since we do not dictate the position of source
tokens and learn them in an end-to-end fashion, we refer to our method as
\textit{dynamic} position encoding (DPE). We evaluated the impact of our model
on multiple datasets to translate from English into German, French, and Italian
and observed meaningful improvements in comparison to the original Transformer
- …