Search CORE

109 research outputs found

Unpaired Image Captioning via Scene Graph Alignments

Author: Cai Jianfei
Gu Jiuxiang
Joty Shafiq
Wang Gang
Yang Xu
Zhao Handong
Publication venue
Publication date: 01/01/2019
Field of study

Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.Comment: Accepted in ICCV 201

arXiv.org e-Print Archive

Crossref

Monash University Research Portal

Learned Attention in Language Acquisition: Blocking, Salience, and Cue Competition.

Author: Ellis Nick C.
Publication venue: Cognitive Science Society
Publication date: 01/01/2007
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139844/1/EuroCogSciEllis.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Blocking and Learned Attention in Language Acquisition.

Author: Ellis Nick C.
Publication venue: Cognitive Science Society
Publication date: 01/01/2007
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139792/1/pp400-ellis.pd

eScholarship - University of California

Deep Blue Documents at the University of Michigan

A Hierarchical Neural Autoencoder for Paragraphs and Documents

Author: Jurafsky Dan
Li Jiwei
Luong Minh-Thang
Publication venue
Publication date: 01/01/2015
Field of study

Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at www.stanford.edu/~jiweil/

arXiv.org e-Print Archive

CiteSeerX

Phrase-based Image Captioning

Author: Collobert Ronan
Lebret Rémi
Pinheiro Pedro O.
Publication venue
Publication date: 09/04/2015
Field of study

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Sequential and unsupervised document authorial clustering based on hidden markov model

Author: Aldebei K
Farhood H
He X
Jia W
Nanda P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/09/2017
Field of study

© 2017 IEEE. Document clustering groups documents of certain similar characteristics in one cluster. Document clustering has shown advantages on organization, retrieval, navigation and summarization of a huge amount of text documents on Internet. This paper presents a novel, unsupervised approach for clustering single-author documents into groups based on authorship. The key novelty is that we propose to extract contextual correlations to depict the writing style hidden among sentences of each document for clustering the documents. For this purpose, we build an Hidden Markov Model (HMM) for representing the relations of sequential sentences, and a two-level, unsupervised framework is constructed. Our proposed approach is evaluated on four benchmark datasets, widely used for document authorship analysis. A scientific paper is also used to demonstrate the performance of the approach on clustering short segments of a text into authorial components. Experimental results show that the proposed approach outperforms the state-of-the-art approaches

OPUS - University of Technology Sydney