18,822 research outputs found
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks in an encoder-decoder configuration. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer, based
solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to be
superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014
English-to-German translation task, improving over the existing best results,
including ensembles by over 2 BLEU. On the WMT 2014 English-to-French
translation task, our model establishes a new single-model state-of-the-art
BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction
of the training costs of the best models from the literature. We show that the
Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure
Masked Attention is All You Need for Graphs
Graph neural networks (GNNs) and variations of the message passing algorithm
are the predominant means for learning on graphs, largely due to their
flexibility, speed, and satisfactory performance. The design of powerful and
general purpose GNNs, however, requires significant research efforts and often
relies on handcrafted, carefully-chosen message passing operators. Motivated by
this, we propose a remarkably simple alternative for learning on graphs that
relies exclusively on attention. Graphs are represented as node or edge sets
and their connectivity is enforced by masking the attention weight matrix,
effectively creating custom attention patterns for each graph. Despite its
simplicity, masked attention for graphs (MAG) has state-of-the-art performance
on long-range tasks and outperforms strong message passing baselines and much
more involved attention-based methods on over 55 node and graph-level tasks. We
also show significantly better transfer learning capabilities compared to GNNs
and comparable or better time and memory scaling. MAG has sub-linear memory
scaling in the number of nodes or edges, enabling learning on dense graphs and
future-proofing the approach
RITA: Group Attention is All You Need for Timeseries Analytics
Timeseries analytics is of great importance in many real-world applications.
Recently, the Transformer model, popular in natural language processing, has
been leveraged to learn high quality feature embeddings from timeseries, core
to the performance of various timeseries analytics tasks. However, the
quadratic time and space complexities limit Transformers' scalability,
especially for long timeseries. To address these issues, we develop a
timeseries analytics tool, RITA, which uses a novel attention mechanism, named
group attention, to address this scalability issue. Group attention dynamically
clusters the objects based on their similarity into a small number of groups
and approximately computes the attention at the coarse group granularity. It
thus significantly reduces the time and space complexity, yet provides a
theoretical guarantee on the quality of the computed attention. The dynamic
scheduler of RITA continuously adapts the number of groups and the batch size
in the training process, ensuring group attention always uses the fewest groups
needed to meet the approximation quality requirement. Extensive experiments on
various timeseries datasets and analytics tasks demonstrate that RITA
outperforms the state-of-the-art in accuracy and is significantly faster --
with speedups of up to 63X
Attention Is All You Need For Blind Room Volume Estimation
In recent years, dynamic parameterization of acoustic environments has raised
increasing attention in the field of audio processing. One of the key
parameters that characterize the local room acoustics in isolation from
orientation and directivity of sources and receivers is the geometric room
volume. Convolutional neural networks (CNNs) have been widely selected as the
main models for conducting blind room acoustic parameter estimation, which aims
to learn a direct mapping from audio spectrograms to corresponding labels. With
the recent trend of self-attention mechanisms, this paper introduces a purely
attention-based model to blindly estimate room volumes based on single-channel
noisy speech signals. We demonstrate the feasibility of eliminating the
reliance on CNN for this task and the proposed Transformer architecture takes
Gammatone magnitude spectral coefficients and phase spectrograms as inputs. To
enhance the model performance given the task-specific dataset, cross-modality
transfer learning is also applied. Experimental results demonstrate that the
proposed model outperforms traditional CNN models across a wide range of
real-world acoustics spaces, especially with the help of the dedicated
pretraining and data augmentation schemes.Comment: 5 pages, 4 figures, submitted ICASSP 202
Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities
We propose the Multi-Head Gaussian Adaptive Attention Mechanism (GAAM), a
novel probabilistic attention framework, and the Gaussian Adaptive Transformer
(GAT), designed to enhance information aggregation across multiple modalities,
including Speech, Text and Vision. GAAM integrates learnable mean and variance
into its attention mechanism, implemented in a Multi-Headed framework enabling
it to collectively model any Probability Distribution for dynamic recalibration
of feature significance. This method demonstrates significant improvements,
especially with highly non-stationary data, surpassing the state-of-the-art
attention techniques in model performance (up to approximately +20% in
accuracy) by identifying key elements within the feature space. GAAM's
compatibility with dot-product-based attention models and relatively low number
of parameters showcases its adaptability and potential to boost existing
attention frameworks. Empirically, GAAM exhibits superior adaptability and
efficacy across a diverse range of tasks, including emotion recognition in
speech, image classification, and text classification, thereby establishing its
robustness and versatility in handling multi-modal data. Furthermore, we
introduce the Importance Factor (IF), a new learning-based metric that enhances
the explainability of models trained with GAAM-based methods. Overall, GAAM
represents an advancement towards development of better performing and more
explainable attention models across multiple modalities
"Attention is all you need". Arquitectura Transformers: descripción y aplicaciones
El procesado del lenguaje natural, más conocido por sus siglas en ingles NPL (Natural lenguage processing), ha ido evolucionando constantemente a lo largo de los años, llegando a estar presente en herramientas que el usuario común usa a diario, como es el traductor de Google. Esta rama del famoso Machine Learning ha tenido una aceptación muy grande entre la comunidad cientÃfica y entre las empresas, lo que está permitiendo un desarrollo vertiginoso.
Algunas de las aplicaciones más comunes de estos algoritmos de NPL, están en la clasificación de textos, traductores de idioma o la generación de texto. Debido a su gran versatilidad ya se están utilizando para la resolución de problemas del mundo real.
En esta búsqueda de las soluciones más eficientes a los problemas de un mundo cada más digitalizado, se han realizado avances en las investigaciones de nuevos algoritmos para la comprensión y generación de texto, como son los Transformers, la red neuronal con mayor acogida en este ámbito hasta el momento, debido a su gran potencial demostrado en modelos de lenguaje grandes como GPT- 4 o LaMDA.
El objetivo de este proyecto es llevar a cabo un estudio profundo de la red neural conocida como Transformer, empezando por sus inicios, las redes neuronales que le preceden, su estructura y funcionamiento, su aplicación práctica en modelos actuales y finalmente resolveremos un problema mediante la elaboración de la red neuronal, entrenamiento y pruebas, pudiendo asà realizar un análisis completo de los resultados obtenidos
- …