39,488 research outputs found

    Multilevel Language and Vision Integration for Text-to-Clip Retrieval

    Full text link
    We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work. First, we inject text features early on when generating clip proposals, to help eliminate unlikely clips and thus speed up processing and boost performance. Second, to learn a fine-grained similarity metric for retrieval, we use visual features to modulate the processing of query sentences at the word level in a recurrent neural network. A multi-task loss is also employed by adding query re-generation as an auxiliary task. Our approach significantly outperforms prior work on two challenging benchmarks: Charades-STA and ActivityNet Captions.Comment: AAAI 201

    Designing Fair Ranking Schemes

    Full text link
    Items from a database are often ranked based on a combination of multiple criteria. A user may have the flexibility to accept combinations that weigh these criteria differently, within limits. On the other hand, this choice of weights can greatly affect the fairness of the produced ranking. In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness. We consider ranking functions that compute the score of each item as a weighted sum of (numeric) attribute values, and then sort items on their score. Each ranking function can be expressed as a vector of weights, or as a point in a multi-dimensional space. For a broad range of fairness criteria, we show how to efficiently identify regions in this space that satisfy these criteria. Using this identification method, our system is able to tell users whether their proposed ranking function satisfies the desired fairness criteria and, if it does not, to suggest the smallest modification that does. We develop user-controllable approximation that and indexing techniques that are applied during preprocessing, and support sub-second response times during the online phase. Our extensive experiments on real datasets demonstrate that our methods are able to find solutions that satisfy fairness criteria effectively and efficiently

    A New Lower Bound for Semigroup Orthogonal Range Searching

    Get PDF
    We report the first improvement in the space-time trade-off of lower bounds for the orthogonal range searching problem in the semigroup model, since Chazelle's result from 1990. This is one of the very fundamental problems in range searching with a long history. Previously, Andrew Yao's influential result had shown that the problem is already non-trivial in one dimension~\cite{Yao-1Dlb}: using mm units of space, the query time Q(n)Q(n) must be Ω(α(m,n)+nmn+1)\Omega( \alpha(m,n) + \frac{n}{m-n+1}) where α(,)\alpha(\cdot,\cdot) is the inverse Ackermann's function, a very slowly growing function. In dd dimensions, Bernard Chazelle~\cite{Chazelle.LB.II} proved that the query time must be Q(n)=Ω((logβn)d1)Q(n) = \Omega( (\log_\beta n)^{d-1}) where β=2m/n\beta = 2m/n. Chazelle's lower bound is known to be tight for when space consumption is `high' i.e., m=Ω(nlogd+εn)m = \Omega(n \log^{d+\varepsilon}n). We have two main results. The first is a lower bound that shows Chazelle's lower bound was not tight for `low space': we prove that we must have m(n)=Ω(n(lognloglogn)d1)m (n) = \Omega(n (\log n \log\log n)^{d-1}). Our lower bound does not close the gap to the existing data structures, however, our second result is that our analysis is tight. Thus, we believe the gap is in fact natural since lower bounds are proven for idempotent semigroups while the data structures are built for general semigroups and thus they cannot assume (and use) the properties of an idempotent semigroup. As a result, we believe to close the gap one must study lower bounds for non-idempotent semigroups or building data structures for idempotent semigroups. We develope significantly new ideas for both of our results that could be useful in pursuing either of these directions

    AMC: Attention guided Multi-modal Correlation Learning for Image Search

    Full text link
    Given a user's query, traditional image search systems rank images according to its relevance to a single modality (e.g., image content or surrounding text). Nowadays, an increasing number of images on the Internet are available with associated meta data in rich modalities (e.g., titles, keywords, tags, etc.), which can be exploited for better similarity measure with queries. In this paper, we leverage visual and textual modalities for image search by learning their correlation with input query. According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities. We propose a novel Attention guided Multi-modal Correlation (AMC) learning method which consists of a jointly learned hierarchy of intra and inter-attention networks. Conditioned on query's intent, intra-attention networks (i.e., visual intra-attention network and language intra-attention network) attend on informative parts within each modality; a multi-modal inter-attention network promotes the importance of the most query-relevant modalities. In experiments, we evaluate AMC models on the search logs from two real world image search engines and show a significant boost on the ranking of user-clicked images in search results. Additionally, we extend AMC models to caption ranking task on COCO dataset and achieve competitive results compared with recent state-of-the-arts.Comment: CVPR 201
    corecore