39,488 research outputs found
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
We address the problem of text-based activity retrieval in video. Given a
sentence describing an activity, our task is to retrieve matching clips from an
untrimmed video. To capture the inherent structures present in both text and
video, we introduce a multilevel model that integrates vision and language
features earlier and more tightly than prior work. First, we inject text
features early on when generating clip proposals, to help eliminate unlikely
clips and thus speed up processing and boost performance. Second, to learn a
fine-grained similarity metric for retrieval, we use visual features to
modulate the processing of query sentences at the word level in a recurrent
neural network. A multi-task loss is also employed by adding query
re-generation as an auxiliary task. Our approach significantly outperforms
prior work on two challenging benchmarks: Charades-STA and ActivityNet
Captions.Comment: AAAI 201
Designing Fair Ranking Schemes
Items from a database are often ranked based on a combination of multiple
criteria. A user may have the flexibility to accept combinations that weigh
these criteria differently, within limits. On the other hand, this choice of
weights can greatly affect the fairness of the produced ranking. In this paper,
we develop a system that helps users choose criterion weights that lead to
greater fairness.
We consider ranking functions that compute the score of each item as a
weighted sum of (numeric) attribute values, and then sort items on their score.
Each ranking function can be expressed as a vector of weights, or as a point in
a multi-dimensional space. For a broad range of fairness criteria, we show how
to efficiently identify regions in this space that satisfy these criteria.
Using this identification method, our system is able to tell users whether
their proposed ranking function satisfies the desired fairness criteria and, if
it does not, to suggest the smallest modification that does. We develop
user-controllable approximation that and indexing techniques that are applied
during preprocessing, and support sub-second response times during the online
phase. Our extensive experiments on real datasets demonstrate that our methods
are able to find solutions that satisfy fairness criteria effectively and
efficiently
A New Lower Bound for Semigroup Orthogonal Range Searching
We report the first improvement in the space-time trade-off of lower bounds
for the orthogonal range searching problem in the semigroup model, since
Chazelle's result from 1990. This is one of the very fundamental problems in
range searching with a long history. Previously, Andrew Yao's influential
result had shown that the problem is already non-trivial in one
dimension~\cite{Yao-1Dlb}: using units of space, the query time must
be where is the
inverse Ackermann's function, a very slowly growing function.
In dimensions, Bernard Chazelle~\cite{Chazelle.LB.II} proved that the
query time must be where .
Chazelle's lower bound is known to be tight for when space consumption is
`high' i.e., . We have two main results.
The first is a lower bound that shows Chazelle's lower bound was not tight for
`low space': we prove that we must have . Our lower bound does not close the gap to the existing data
structures, however, our second result is that our analysis is tight. Thus, we
believe the gap is in fact natural since lower bounds are proven for idempotent
semigroups while the data structures are built for general semigroups and thus
they cannot assume (and use) the properties of an idempotent semigroup. As a
result, we believe to close the gap one must study lower bounds for
non-idempotent semigroups or building data structures for idempotent
semigroups. We develope significantly new ideas for both of our results that
could be useful in pursuing either of these directions
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
- …