Search CORE

566 research outputs found

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Author: Huang Zhiheng
Mao Junhua
Wang Jiang
Xu Wei
Yang Yi
Yuille Alan
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.109

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Direction-of-Arrival Estimation Based on Joint Sparsity

Author: Huang Zhitao
Wang Junhua
Zhou Yiyu
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/09/2011
Field of study

We present a DOA estimation algorithm, called Joint-Sparse DOA to address the problem of Direction-of-Arrival (DOA) estimation using sensor arrays. Firstly, DOA estimation is cast as the joint-sparse recovery problem. Then, norm is approximated by an arctan function to represent joint sparsity and DOA estimation can be obtained by minimizing the approximate norm. Finally, the minimization problem is solved by a quasi-Newton method to estimate DOA. Simulation results show that our algorithm has some advantages over most existing methods: it needs a small number of snapshots to estimate DOA, while the number of sources need not be known a priori. Besides, it improves the resolution, and it can also handle the coherent sources well

Directory of Open Access Journals

PubMed Central

Making the Invisible Visible:Documentaries chronicling the lives of Palestinians in and outside the Middle East

Author: HUANG Junhua
Jørholt Eva
Publication venue
Publication date: 01/03/2017
Field of study

Copenhagen University Research Information System

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Author: Huang Zhiheng
Mao Junhua
Wang Jiang
Xu Wei
Yang Yi
Yuille Alan L.
Publication venue: Center for Brains, Minds and Machines (CBMM), arXiv
Publication date: 07/05/2015
Field of study

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

DSpace@MIT

Defect-assisted conductivity in organic ionic plastic crystals

Author: Forsyth Maria
Hill Anita J.
Huang Junhua
MacFarlane Douglas R.
Pas Steven J.
Publication venue: 'AIP Publishing'
Publication date: 08/02/2005
Field of study

Deakin Research Online

SVDinsTN: An Integrated Method for Tensor Network Representation with Efficient Structure Search

Author: Huang Ting-Zhu
Li Chao
Li Heng-Chao
Zeng Junhua
Zhao Qibin
Zhao Xi-Le
Zheng Yu-Bang
Publication venue
Publication date: 24/05/2023
Field of study

Tensor network (TN) representation is a powerful technique for data analysis and machine learning. It practically involves a challenging TN structure search (TN-SS) problem, which aims to search for the optimal structure to achieve a compact representation. Existing TN-SS methods mainly adopt a bi-level optimization method that leads to excessive computational costs due to repeated structure evaluations. To address this issue, we propose an efficient integrated (single-level) method named SVD-inspired TN decomposition (SVDinsTN), eliminating the need for repeated tedious structure evaluation. By inserting a diagonal factor for each edge of the fully-connected TN, we calculate TN cores and diagonal factors simultaneously, with factor sparsity revealing the most compact TN structure. Experimental results on real-world data demonstrate that SVDinsTN achieves approximately

10^2\sim{}10^3

times acceleration in runtime compared to the existing TN-SS methods while maintaining a comparable level of representation ability

arXiv.org e-Print Archive