109 research outputs found
Unpaired Image Captioning via Scene Graph Alignments
Most of current image captioning models heavily rely on paired image-caption
datasets. However, getting large scale image-caption paired data is
labor-intensive and time-consuming. In this paper, we present a scene
graph-based approach for unpaired image captioning. Our framework comprises an
image scene graph generator, a sentence scene graph generator, a scene graph
encoder, and a sentence decoder. Specifically, we first train the scene graph
encoder and the sentence decoder on the text modality. To align the scene
graphs between images and sentences, we propose an unsupervised feature
alignment method that maps the scene graph features from the image to the
sentence modality. Experimental results show that our proposed model can
generate quite promising results without using any image-caption training
pairs, outperforming existing methods by a wide margin.Comment: Accepted in ICCV 201
Learned Attention in Language Acquisition: Blocking, Salience, and Cue Competition.
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139844/1/EuroCogSciEllis.pd
Recommended from our members
Blocking and Learned Attention in Language Acquisition.
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139792/1/pp400-ellis.pd
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
Phrase-based Image Captioning
Generating a novel textual description of an image is an interesting problem
that connects computer vision and natural language processing. In this paper,
we present a simple model that is able to generate descriptive sentences given
a sample image. This model has a strong focus on the syntax of the
descriptions. We train a purely bilinear model that learns a metric between an
image representation (generated from a previously trained Convolutional Neural
Network) and phrases that are used to described them. The system is then able
to infer phrases from a given image sample. Based on caption syntax statistics,
we propose a simple language model that can produce relevant descriptions for a
given test image using the phrases inferred. Our approach, which is
considerably simpler than state-of-the-art models, achieves comparable results
in two popular datasets for the task: Flickr30k and the recently proposed
Microsoft COCO
Sequential and unsupervised document authorial clustering based on hidden markov model
© 2017 IEEE. Document clustering groups documents of certain similar characteristics in one cluster. Document clustering has shown advantages on organization, retrieval, navigation and summarization of a huge amount of text documents on Internet. This paper presents a novel, unsupervised approach for clustering single-author documents into groups based on authorship. The key novelty is that we propose to extract contextual correlations to depict the writing style hidden among sentences of each document for clustering the documents. For this purpose, we build an Hidden Markov Model (HMM) for representing the relations of sequential sentences, and a two-level, unsupervised framework is constructed. Our proposed approach is evaluated on four benchmark datasets, widely used for document authorship analysis. A scientific paper is also used to demonstrate the performance of the approach on clustering short segments of a text into authorial components. Experimental results show that the proposed approach outperforms the state-of-the-art approaches
- …