11 research outputs found

    Input Combination Strategies for Multi-Source Transformer Decoder

    Get PDF
    In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.Comment: Published at WMT1

    From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

    Full text link
    Since its introduction, the transformer model has demonstrated outstanding performance across various tasks. However, there are still unresolved issues regarding length generalization, particularly in algorithmic tasks. In this paper, we investigate the inherent capabilities of transformer models in learning arithmetic algorithms, such as addition and multiplication. Through experiments and attention analysis, we identify a number of crucial factors for achieving optimal length generalization. We show that transformer models are able to generalize to long lengths with the help of targeted attention biasing. We then introduce Attention Bias Calibration (ABC), a calibration stage that enables the model to automatically learn the proper attention biases, which we link to mechanisms in relative position encoding. We demonstrate that using ABC, the transformer model can achieve unprecedented perfect length generalization on certain arithmetic tasks

    Word-Region Alignment-Guided Multimodal Neural Machine Translation

    Get PDF
    We propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly focused on the effect of integrating visual and textual modalities. However, they do not leverage the semantic relevance between the two modalities. We advance the semantic correlation between textual and visual modalities in MNMT by incorporating WRA as a bridge. This proposal has been implemented on two mainstream architectures of neural machine translation (NMT): the recurrent neural network (RNN) and the transformer. Experiments on two public benchmarks, English--German and English--French translation tasks using the Multi30k dataset and English--Japanese translation tasks using the Flickr30kEnt-JP dataset prove that our model has a significant improvement with respect to the competitive baselines across different evaluation metrics and outperforms most of the existing MNMT models. For example, 1.0 BLEU scores are improved for the English-German task and 1.1 BLEU scores are improved for the English-French task on the Multi30k test2016 set; and 0.7 BLEU scores are improved for the English-Japanese task on the Flickr30kEnt-JP test set. Further analysis demonstrates that our model can achieve better translation performance by integrating WRA, leading to better visual information use

    Multi-stream Longitudinal Data Analysis using Deep Learning

    Get PDF
    Longitudinal healthcare data encompasses all tasks where patients information are collected at multiple follow-up times. Analyzing this data is critical in addressing many real world problems in healthcare such as disease prediction and prevention. In this thesis, technical challenges in analyzing longitudinal administrative claims data are addressed and novel deep learning based models are proposed for multi-stream data analysis and disease prediction tasks. These algorithms and frameworks are assessed mainly on substance use disorders prediction tasks and specifically designed to tackled these disorders. Substance use disorder is a public health crisis costing the US an estimated $740 billion annually in healthcare, lost workplace productivity, and crime. Early identification and engagement of individuals at risk of developing a substance use disorder is a critical unmet need in healthcare which can be achieved by producing automatic artificial intelligence based tools trained using big healthcare data. In fact, healthcare data can be harnessed together with artificial intelligence and machine learning to advance our understanding of factors that increase the propensity for developing different diseases as well as those that aid in the treatment of these disorders. Here in, a disease prediction framework is first proposed based on recurrent neural networks. This framework includes three components: 1) data pre-processing, 2) disease prediction using long short term memory models, and 3) hypothesis exploration by varying the models and the inputs. This framework is assessed using two use cases: substance use disorder prediction and mild cognitive impairment prediction. Experimental results show that this proposed model can efficiently analyze patients\u27 data and creates efficient disease prediction tools. Second, the limitationsof current deep learning models including long short term memory models in claimsdata analysis are detected and addressed, and a novel model based on the transformer models is proposed. In fact, leveraging the real-world longitudinal claims data, a novel multi-stream transformer model is proposed for predicting opioid use disorder as an important case of substance use disorders. This model is designed to simultaneously analyze multiple types of data streams, such as medications, diagnoses, procedures and demographics, by attending to segments within and across these data streams. The proposed model tested on the IBM MarketScan data showed significantly better performance than the traditional models and recently developed deep learning models
    corecore