45,779 research outputs found
DeepStory: Video Story QA by Deep Embedded Memory Networks
Question-answering (QA) on video contents is a significant challenge for
achieving human-level intelligence as it involves both vision and language in
real-world settings. Here we demonstrate the possibility of an AI agent
performing video story QA by learning from a large amount of cartoon videos. We
develop a video-story learning model, i.e. Deep Embedded Memory Networks
(DEMN), to reconstruct stories from a joint scene-dialogue video stream using a
latent embedding space of observed data. The video stories are stored in a
long-term memory component. For a given question, an LSTM-based attention model
uses the long-term memory to recall the best question-story-answer triplet by
focusing on specific words containing key information. We trained the DEMN on a
novel QA dataset of children's cartoon video series, Pororo. The dataset
contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained
sentences for scene description, and 8,913 story-related QA pairs. Our
experimental results show that the DEMN outperforms other QA models. This is
mainly due to 1) the reconstruction of video stories in a scene-dialogue
combined form that utilize the latent embedding and 2) attention. DEMN also
achieved state-of-the-art results on the MovieQA benchmark.Comment: 7 pages, accepted for IJCAI 201
NLSC: Unrestricted Natural Language-based Service Composition through Sentence Embeddings
Current approaches for service composition (assemblies of atomic services)
require developers to use: (a) domain-specific semantics to formalize services
that restrict the vocabulary for their descriptions, and (b) translation
mechanisms for service retrieval to convert unstructured user requests to
strongly-typed semantic representations. In our work, we argue that effort to
developing service descriptions, request translations, and matching mechanisms
could be reduced using unrestricted natural language; allowing both: (1)
end-users to intuitively express their needs using natural language, and (2)
service developers to develop services without relying on syntactic/semantic
description languages. Although there are some natural language-based service
composition approaches, they restrict service retrieval to syntactic/semantic
matching. With recent developments in Machine learning and Natural Language
Processing, we motivate the use of Sentence Embeddings by leveraging richer
semantic representations of sentences for service description, matching and
retrieval. Experimental results show that service composition development
effort may be reduced by more than 44\% while keeping a high precision/recall
when matching high-level user requests with low-level service method
invocations.Comment: This paper will appear on SCC'19 (IEEE International Conference on
Services Computing) on July 1
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be deļ¬ned by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The ļ¬ndings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
- ā¦