359 research outputs found
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model
for generating novel image captions. It directly models the probability
distribution of generating a word given previous words and an image. Image
captions are generated by sampling from this distribution. The model consists
of two sub-networks: a deep recurrent neural network for sentences and a deep
convolutional network for images. These two sub-networks interact with each
other in a multimodal layer to form the whole m-RNN model. The effectiveness of
our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K,
Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In
addition, we apply the m-RNN model to retrieval tasks for retrieving images or
sentences, and achieves significant performance improvement over the
state-of-the-art methods which directly optimize the ranking objective function
for retrieval. The project page of this work is:
www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning
task significantly. More details are shown in Section 8 of the paper. The
code and related data are available at https://github.com/mjhucla/mRNN-CR ;.
arXiv admin note: substantial text overlap with arXiv:1410.109
Gastroesophageal Reflux Disease: Molecular Predictors in Neoplastic Progression of Barrett\u27s Esophagus
Processing-structure-protrusion relationship of 3D Cu TSVs: control at the atomic scale
A phase-field-crystal model is used to investigate the processing-structure-protrusion relationship of blind Cu through-silicon vias (TSVs) at the atomic scale. A higher temperature results in a larger TSV protrusion. Deformation via dislocation motion dominates at temperatures lower than around 300∘C, while both diffusional and dislocation creep occur at temperatures greater than around 300∘C. TSVs with smaller sidewall roughness Ra and wavelength λa exhibit larger protrusions. Moreover, different protrusion profiles are observed for TSVs with different grain structures. Both protrusions and intrusions are observed when a single grain is placed near the TSV top end, while the top surface protrudes near both edges when it contains more grains. Under symmetric loading, coalescence of the grains occurs near the top end, and a symmetric grain structure can accelerate this process. The strain distributions in TSVs are calculated, and the eigenstrain projection along the vertical direction can be considered an index to predict the TSV protrusion tendency
Protrusion of Cu-TSV under different strain states
A phase-field-crystal (PFC) model is used to investigate the protrusion of blind TSVs under different strain states. The direction of loading applied to the TSVs has an effect on the protrusion, which is closely related to the copper grains and their orientations at the TSV edges. A nonlinear relation between protrusion and strain rate has been found, which can be explained by different mechanisms of deformation. A higher strain occurring near the top end of the TSVs leads to a larger protrusion of the bind TSVs
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
- …