97 research outputs found

    Convexifying Transformers: Improving optimization and understanding of transformer networks

    Full text link
    Understanding the fundamental mechanism behind the success of transformer networks is still an open problem in the deep learning literature. Although their remarkable performance has been mostly attributed to the self-attention mechanism, the literature still lacks a solid analysis of these networks and interpretation of the functions learned by them. To this end, we study the training problem of attention/transformer networks and introduce a novel convex analytic approach to improve the understanding and optimization of these networks. Particularly, we first introduce a convex alternative to the self-attention mechanism and reformulate the regularized training problem of transformer networks with our alternative convex attention. Then, we cast the reformulation as a convex optimization problem that is interpretable and easier to optimize. Moreover, as a byproduct of our convex analysis, we reveal an implicit regularization mechanism, which promotes sparsity across tokens. Therefore, we not only improve the optimization of attention/transformer networks but also provide a solid theoretical understanding of the functions learned by them. We also demonstrate the effectiveness of our theory through several numerical experiments

    Mechanic: A Learning Rate Tuner

    Full text link
    We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call \textsc{mechanic}. Our method provides a practical realization of recent theoretical reductions for accomplishing a similar goal in online convex optimization. We rigorously evaluate \textsc{mechanic} on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms. These experiments demonstrate that depending on the problem, \textsc{mechanic} either comes very close to, matches or even improves upon manual tuning of learning rates

    Simplifying and Understanding State Space Models with Diagonal Linear RNNs

    Full text link
    Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs (DLR\mathrm{DLR}). We empirically show that, despite being conceptually much simpler, DLR\mathrm{DLR} is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including DLR\mathrm{DLR}) and attention-based models via a suite of 1313 synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via few\textit{few} convolutional kernels, they struggle on tasks requiring many\textit{many} such kernels and especially when the desired sequence manipulation is context-dependent\textit{context-dependent}. Despite these limitations, DLR\mathrm{DLR} reaches high performance on two higher-order reasoning tasks ListOpsSubTrees\mathrm{ListOpsSubTrees} and PathfinderSegmentation-256\mathrm{PathfinderSegmentation}\text{-}\mathrm{256} with input lengths 8K8K and 65K65K respectively, and gives encouraging performance on PathfinderSegmentation-512\mathrm{PathfinderSegmentation}\text{-}\mathrm{512} with input length 262K262K for which attention is not a viable choice.Comment: added Long Range Arena, language modeling with mixture of expert

    MedicHub – Disease Detection Using Deep Learning

    Get PDF
    The integration of technology in healthcare is rapidly revolutionizing the sector and transforming the traditional modus operandi that used to be followed into a more efficient and accurate automated system. Machine Learning is a sophisticated technology used to analyze clinical symptoms to predict diseases and deliver accurate diagnoses based on strong evidence. The major advantage of using technology to assist in diagnosis is to understand more aboutunderlying illnesses that are often overlooked while searching for a more severe disease, or when the patient is not in imminent danger. This offers patients a very reliable and accessible alternative for immediate results and also minimizes the risk of errors. Another extremely good utility of technology is withinside the discipline of medical image analysis. CNN are neural networks which are capable of recognizing patterns in pictures and hence must be included in the system to increase its accuracy and efficacy
    • …
    corecore