97 research outputs found
Convexifying Transformers: Improving optimization and understanding of transformer networks
Understanding the fundamental mechanism behind the success of transformer
networks is still an open problem in the deep learning literature. Although
their remarkable performance has been mostly attributed to the self-attention
mechanism, the literature still lacks a solid analysis of these networks and
interpretation of the functions learned by them. To this end, we study the
training problem of attention/transformer networks and introduce a novel convex
analytic approach to improve the understanding and optimization of these
networks. Particularly, we first introduce a convex alternative to the
self-attention mechanism and reformulate the regularized training problem of
transformer networks with our alternative convex attention. Then, we cast the
reformulation as a convex optimization problem that is interpretable and easier
to optimize. Moreover, as a byproduct of our convex analysis, we reveal an
implicit regularization mechanism, which promotes sparsity across tokens.
Therefore, we not only improve the optimization of attention/transformer
networks but also provide a solid theoretical understanding of the functions
learned by them. We also demonstrate the effectiveness of our theory through
several numerical experiments
Mechanic: A Learning Rate Tuner
We introduce a technique for tuning the learning rate scale factor of any
base optimization algorithm and schedule automatically, which we call
\textsc{mechanic}. Our method provides a practical realization of recent
theoretical reductions for accomplishing a similar goal in online convex
optimization. We rigorously evaluate \textsc{mechanic} on a range of large
scale deep learning tasks with varying batch sizes, schedules, and base
optimization algorithms. These experiments demonstrate that depending on the
problem, \textsc{mechanic} either comes very close to, matches or even improves
upon manual tuning of learning rates
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Sequence models based on linear state spaces (SSMs) have recently emerged as
a promising choice of architecture for modeling long range dependencies across
various modalities. However, they invariably rely on discretization of a
continuous state space, which complicates their presentation and understanding.
In this work, we dispose of the discretization step, and propose a model based
on vanilla Diagonal Linear RNNs (). We empirically show that,
despite being conceptually much simpler, is as performant as
previously-proposed SSMs on a variety of tasks and benchmarks including Long
Range Arena and raw speech classification. Moreover, we characterize the
expressivity of SSMs (including ) and attention-based models via
a suite of synthetic sequence-to-sequence tasks involving interactions
over tens of thousands of tokens, ranging from simple operations, such as
shifting an input sequence, to detecting co-dependent visual features over long
spatial ranges in flattened images. We find that while SSMs report near-perfect
performance on tasks that can be modeled via convolutional
kernels, they struggle on tasks requiring such kernels and
especially when the desired sequence manipulation is
. Despite these limitations, reaches
high performance on two higher-order reasoning tasks
and with input lengths
and respectively, and gives encouraging performance on
with input length
for which attention is not a viable choice.Comment: added Long Range Arena, language modeling with mixture of expert
MedicHub β Disease Detection Using Deep Learning
The integration of technology in healthcare is rapidly revolutionizing the sector and transforming the traditional modus operandi that used to be followed into a more efficient and accurate automated system. Machine Learning is a sophisticated technology used to analyze clinical symptoms to predict diseases and deliver accurate diagnoses based on strong evidence. The major advantage of using technology to assist in diagnosis is to understand more aboutunderlying illnesses that are often overlooked while searching for a more severe disease, or when the patient is not in imminent danger. This offers patients a very reliable and accessible alternative for immediate results and also minimizes the risk of errors. Another extremely good utility of technology is withinside the discipline of medical image analysis. CNN are neural networks which are capable of recognizing patterns in pictures and hence must be included in the system to increase its accuracy and efficacy
- β¦