10,200 research outputs found
End-to-end Learning for Short Text Expansion
Effectively making sense of short texts is a critical task for many real
world applications such as search engines, social media services, and
recommender systems. The task is particularly challenging as a short text
contains very sparse information, often too sparse for a machine learning
algorithm to pick up useful signals. A common practice for analyzing short text
is to first expand it with external information, which is usually harvested
from a large collection of longer texts. In literature, short text expansion
has been done with all kinds of heuristics. We propose an end-to-end solution
that automatically learns how to expand short text to optimize a given learning
task. A novel deep memory network is proposed to automatically find relevant
information from a collection of longer documents and reformulate the short
text through a gating mechanism. Using short text classification as a
demonstrating task, we show that the deep memory network significantly
outperforms classical text expansion methods with comprehensive experiments on
real world data sets.Comment: KDD'201
Generating Text Sequence Images for Recognition
Recently, methods based on deep learning have dominated the field of text
recognition. With a large number of training data, most of them can achieve the
state-of-the-art performances. However, it is hard to harvest and label
sufficient text sequence images from the real scenes. To mitigate this issue,
several methods to synthesize text sequence images were proposed, yet they
usually need complicated preceding or follow-up steps. In this work, we present
a method which is able to generate infinite training data without any auxiliary
pre/post-process. We tackle the generation task as an image-to-image
translation one and utilize conditional adversarial networks to produce
realistic text sequence images in the light of the semantic ones. Some
evaluation metrics are involved to assess our method and the results
demonstrate that the caliber of the data is satisfactory. The code and dataset
will be publicly available soon
- …