93,745 research outputs found
Sherlock: Scalable Fact Learning in Images
We study scalable and uniform understanding of facts in images. Existing
visual recognition systems are typically modeled differently for each fact type
such as objects, actions, and interactions. We propose a setting where all
these facts can be modeled simultaneously with a capacity to understand
unbounded number of facts in a structured way. The training data comes as
structured facts in images, including (1) objects (e.g., ), (3) actions (e.g., ). Each fact has a semantic
language view (e.g., ) and a visual view (an image with this
fact). We show that learning visual facts in a structured way enables not only
a uniform but also generalizable visual understanding. We propose and
investigate recent and strong approaches from the multiview learning literature
and also introduce two learning representation models as potential baselines.
We applied the investigated methods on several datasets that we augmented with
structured facts and a large scale dataset of more than 202,000 facts and
814,000 images. Our experiments show the advantage of relating facts by the
structure by the proposed models compared to the designed baselines on
bidirectional fact retrieval.Comment: Jan 7 Updat
Offline Handwritten Signature Verification - Literature Review
The area of Handwritten Signature Verification has been broadly researched in
the last decades, but remains an open research problem. The objective of
signature verification systems is to discriminate if a given signature is
genuine (produced by the claimed individual), or a forgery (produced by an
impostor). This has demonstrated to be a challenging task, in particular in the
offline (static) scenario, that uses images of scanned signatures, where the
dynamic information about the signing process is not available. Many
advancements have been proposed in the literature in the last 5-10 years, most
notably the application of Deep Learning methods to learn feature
representations from signature images. In this paper, we present how the
problem has been handled in the past few decades, analyze the recent
advancements in the field, and the potential directions for future research.Comment: Accepted to the International Conference on Image Processing Theory,
Tools and Applications (IPTA 2017
Parsing Thai Social Data: A New Challenge for Thai NLP
Dependency parsing (DP) is a task that analyzes text for syntactic structure
and relationship between words. DP is widely used to improve natural language
processing (NLP) applications in many languages such as English. Previous works
on DP are generally applicable to formally written languages. However, they do
not apply to informal languages such as the ones used in social networks.
Therefore, DP has to be researched and explored with such social network data.
In this paper, we explore and identify a DP model that is suitable for Thai
social network data. After that, we will identify the appropriate linguistic
unit as an input. The result showed that, the transition based model called,
improve Elkared dependency parser outperform the others at UAS of 81.42%.Comment: 7 Pages, 8 figures, to be published in The 14th International Joint
Symposium on Artificial Intelligence and Natural Language Processing
(iSAI-NLP 2019
Cold Fusion: Training Seq2Seq Models Together with Language Models
Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks
which involve generating natural language sentences such as machine
translation, image captioning and speech recognition. Performance has further
been improved by leveraging unlabeled data, often in the form of a language
model. In this work, we present the Cold Fusion method, which leverages a
pre-trained language model during training, and show its effectiveness on the
speech recognition task. We show that Seq2Seq models with Cold Fusion are able
to better utilize language information enjoying i) faster convergence and
better generalization, and ii) almost complete transfer to a new domain while
using less than 10% of the labeled training data
- …