27,995 research outputs found
Your Transformer May Not be as Powerful as You Expect
Relative Positional Encoding (RPE), which encodes the relative distance
between any pair of tokens, is one of the most successful modifications to the
original Transformer. As far as we know, theoretical understanding of the
RPE-based Transformers is largely unexplored. In this work, we mathematically
analyze the power of RPE-based Transformers regarding whether the model is
capable of approximating any continuous sequence-to-sequence functions. One may
naturally assume the answer is in the affirmative -- RPE-based Transformers are
universal function approximators. However, we present a negative result by
showing there exist continuous sequence-to-sequence functions that RPE-based
Transformers cannot approximate no matter how deep and wide the neural network
is. One key reason lies in that most RPEs are placed in the softmax attention
that always generates a right stochastic matrix. This restricts the network
from capturing positional information in the RPEs and limits its capacity. To
overcome the problem and make the model more powerful, we first present
sufficient conditions for RPE-based Transformers to achieve universal function
approximation. With the theoretical guidance, we develop a novel attention
module, called Universal RPE-based (URPE) Attention, which satisfies the
conditions. Therefore, the corresponding URPE-based Transformers become
universal function approximators. Extensive experiments covering typical
architectures and tasks demonstrate that our model is parameter-efficient and
can achieve superior performance to strong baselines in a wide range of
applications. The code will be made publicly available at
https://github.com/lsj2408/URPE.Comment: 22 pages; NeurIPS 2022, Camera Ready Versio
How to Fine-Tune BERT for Text Classification?
Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training
model, BERT (Bidirectional Encoder Representations from Transformers) has
achieved amazing results in many language understanding tasks. In this paper,
we conduct exhaustive experiments to investigate different fine-tuning methods
of BERT on text classification task and provide a general solution for BERT
fine-tuning. Finally, the proposed solution obtains new state-of-the-art
results on eight widely-studied text classification datasets
Small Transformers Compute Universal Metric Embeddings
We study representations of data from an arbitrary metric space
in the space of univariate Gaussian mixtures with a transport metric (Delon and
Desolneux 2020). We derive embedding guarantees for feature maps implemented by
small neural networks called \emph{probabilistic transformers}. Our guarantees
are of memorization type: we prove that a probabilistic transformer of depth
about and width about can bi-H\"{o}lder embed any -point
dataset from with low metric distortion, thus avoiding the curse
of dimensionality. We further derive probabilistic bi-Lipschitz guarantees,
which trade off the amount of distortion and the probability that a randomly
chosen pair of points embeds with that distortion. If 's geometry
is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all
points in the dataset. As applications, we derive neural embedding guarantees
for datasets from Riemannian manifolds, metric trees, and certain types of
combinatorial graphs. When instead embedding into multivariate Gaussian
mixtures, we show that probabilistic transformers can compute bi-H\"{o}lder
embeddings with arbitrarily small distortion.Comment: 42 pages, 10 Figures, 3 Table
A Universal Latent Fingerprint Enhancer Using Transformers
Forensic science heavily relies on analyzing latent fingerprints, which are
crucial for criminal investigations. However, various challenges, such as
background noise, overlapping prints, and contamination, make the
identification process difficult. Moreover, limited access to real crime scene
and laboratory-generated databases hinders the development of efficient
recognition algorithms. This study aims to develop a fast method, which we call
ULPrint, to enhance various latent fingerprint types, including those obtained
from real crime scenes and laboratory-created samples, to boost fingerprint
recognition system performance. In closed-set identification accuracy
experiments, the enhanced image was able to improve the performance of the
MSU-AFIS from 61.56\% to 75.19\% in the NIST SD27 database, from 67.63\% to
77.02\% in the MSP Latent database, and from 46.90\% to 52.12\% in the NIST
SD302 database. Our contributions include (1) the development of a two-step
latent fingerprint enhancement method that combines Ridge Segmentation with
UNet and Mix Visual Transformer (MiT) SegFormer-B5 encoder architecture, (2)
the implementation of multiple dilated convolutions in the UNet architecture to
capture intricate, non-local patterns better and enhance ridge segmentation,
and (3) the guided blending of the predicted ridge mask with the latent
fingerprint. This novel approach, ULPrint, streamlines the enhancement process,
addressing challenges across diverse latent fingerprint types to improve
forensic investigations and criminal justice outcomes
On a New Notion of Partial Refinement
Formal specification techniques allow expressing idealized specifications,
which abstract from restrictions that may arise in implementations. However,
partial implementations are universal in software development due to practical
limitations. Our goal is to contribute to a method of program refinement that
allows for partial implementations. For programs with a normal and an
exceptional exit, we propose a new notion of partial refinement which allows an
implementation to terminate exceptionally if the desired results cannot be
achieved, provided the initial state is maintained. Partial refinement leads to
a systematic method of developing programs with exception handling.Comment: In Proceedings Refine 2013, arXiv:1305.563
- …