15,783 research outputs found
Joint Video and Text Parsing for Understanding Events and Answering Queries
We propose a framework for parsing video and text jointly for understanding
events and answering user queries. Our framework produces a parse graph that
represents the compositional structures of spatial information (objects and
scenes), temporal information (actions and events) and causal information
(causalities between events and fluents) in the video and text. The knowledge
representation of our framework is based on a spatial-temporal-causal And-Or
graph (S/T/C-AOG), which jointly models possible hierarchical compositions of
objects, scenes and events as well as their interactions and mutual contexts,
and specifies the prior probabilistic distribution of the parse graphs. We
present a probabilistic generative model for joint parsing that captures the
relations between the input video/text, their corresponding parse graphs and
the joint parse graph. Based on the probabilistic model, we propose a joint
parsing system consisting of three modules: video parsing, text parsing and
joint inference. Video parsing and text parsing produce two parse graphs from
the input video and text respectively. The joint inference module produces a
joint parse graph by performing matching, deduction and revision on the video
and text parse graphs. The proposed framework has the following objectives:
Firstly, we aim at deep semantic parsing of video and text that goes beyond the
traditional bag-of-words approaches; Secondly, we perform parsing and reasoning
across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG
representation; Thirdly, we show that deep joint parsing facilitates subsequent
applications such as generating narrative text descriptions and answering
queries in the forms of who, what, when, where and why. We empirically
evaluated our system based on comparison against ground-truth as well as
accuracy of query answering and obtained satisfactory results
Statistical analysis of the owl:sameAs network for aligning concepts in the linking open data cloud
The massively distributed publication of linked data has brought to the attention of scientific community the limitations of classic methods for achieving data integration and the opportunities of pushing the boundaries of the field by experimenting this collective enterprise that is the linking open data cloud. While reusing existing ontologies is the choice of preference, the exploitation of ontology alignments still is a required step for easing the burden of integrating heterogeneous data sets. Alignments, even between the most used vocabularies, is still poorly supported in systems nowadays whereas links between instances are the most widely used means for bridging the gap between different data sets. We provide in this paper an account of our statistical and qualitative analysis of the network of instance level equivalences in the Linking Open Data Cloud (i.e. the sameAs network) in order to automatically compute alignments at the conceptual level. Moreover, we explore the effect of ontological information when adopting classical Jaccard methods to the ontology alignment task. Automating such task will allow in fact to achieve a clearer conceptual description of the data at the cloud level, while improving the level of integration between datasets. <br/
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
We introduce a model for bidirectional retrieval of images and sentences
through a multi-modal embedding of visual and natural language data. Unlike
previous models that directly map images or sentences into a common embedding
space, our model works on a finer level and embeds fragments of images
(objects) and fragments of sentences (typed dependency tree relations) into a
common space. In addition to a ranking objective seen in previous work, this
allows us to add a new fragment alignment objective that learns to directly
associate these fragments across modalities. Extensive experimental evaluation
shows that reasoning on both the global level of images and sentences and the
finer level of their respective fragments significantly improves performance on
image-sentence retrieval tasks. Additionally, our model provides interpretable
predictions since the inferred inter-modal fragment alignment is explicit
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
The task of video grounding, which temporally localizes a natural language
description in a video, plays an important role in understanding videos.
Existing studies have adopted strategies of sliding window over the entire
video or exhaustively ranking all possible clip-sentence pairs in a
pre-segmented video, which inevitably suffer from exhaustively enumerated
candidates. To alleviate this problem, we formulate this task as a problem of
sequential decision making by learning an agent which regulates the temporal
grounding boundaries progressively based on its policy. Specifically, we
propose a reinforcement learning based framework improved by multi-task
learning and it shows steady performance gains by considering additional
supervised boundary information during training. Our proposed framework
achieves state-of-the-art performance on ActivityNet'18 DenseCaption dataset
and Charades-STA dataset while observing only 10 or less clips per video.Comment: AAAI 201
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
We evaluate the impact of probabilistically-constructed digital identity data
collected from Sep. to Dec. 2017 (approx.), in the context of
Lookalike-targeted campaigns. The backbone of this study is a large set of
probabilistically-constructed "identities", represented as small bags of
cookies and mobile ad identifiers with associated metadata, that are likely all
owned by the same underlying user. The identity data allows to generate
"identity-based", rather than "identifier-based", user models, giving a fuller
picture of the interests of the users underlying the identifiers. We employ
off-policy techniques to evaluate the potential of identity-powered lookalike
models without incurring the risk of allowing untested models to direct large
amounts of ad spend or the large cost of performing A/B tests. We add to
historical work on off-policy evaluation by noting a significant type of
"finite-sample bias" that occurs for studies combining modestly-sized datasets
and evaluation metrics involving rare events (e.g., conversions). We illustrate
this bias using a simulation study that later informs the handling of inverse
propensity weights in our analyses on real data. We demonstrate significant
lift in identity-powered lookalikes versus an identity-ignorant baseline: on
average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for
identifiers having little data themselves, but that can be inferred to belong
to users with substantial data to aggregate across identifiers. This implies
that identity-powered user modeling is especially important in the context of
identifiers having very short lifespans (i.e., frequently churned cookies). Our
work motivates and informs the use of probabilistically-constructed identities
in marketing. It also deepens the canon of examples in which off-policy
learning has been employed to evaluate the complex systems of the internet
economy.Comment: Accepted by WSDM 201
- …