359 research outputs found

    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

    Get PDF
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.109

    Processing-structure-protrusion relationship of 3D Cu TSVs: control at the atomic scale

    Get PDF
    A phase-field-crystal model is used to investigate the processing-structure-protrusion relationship of blind Cu through-silicon vias (TSVs) at the atomic scale. A higher temperature results in a larger TSV protrusion. Deformation via dislocation motion dominates at temperatures lower than around 300∘C, while both diffusional and dislocation creep occur at temperatures greater than around 300∘C. TSVs with smaller sidewall roughness Ra and wavelength λa exhibit larger protrusions. Moreover, different protrusion profiles are observed for TSVs with different grain structures. Both protrusions and intrusions are observed when a single grain is placed near the TSV top end, while the top surface protrudes near both edges when it contains more grains. Under symmetric loading, coalescence of the grains occurs near the top end, and a symmetric grain structure can accelerate this process. The strain distributions in TSVs are calculated, and the eigenstrain projection along the vertical direction can be considered an index to predict the TSV protrusion tendency

    Protrusion of Cu-TSV under different strain states

    Get PDF
    A phase-field-crystal (PFC) model is used to investigate the protrusion of blind TSVs under different strain states. The direction of loading applied to the TSVs has an effect on the protrusion, which is closely related to the copper grains and their orientations at the TSV edges. A nonlinear relation between protrusion and strain rate has been found, which can be explained by different mechanisms of deformation. A higher strain occurring near the top end of the TSVs leads to a larger protrusion of the bind TSVs

    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

    Get PDF
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
    • …
    corecore