591,109 research outputs found
MIHash: Online Hashing with Mutual Information
Learning-based hashing methods are widely used for nearest neighbor
retrieval, and recently, online hashing methods have demonstrated good
performance-complexity trade-offs by learning hash functions from streaming
data. In this paper, we first address a key challenge for online hashing: the
binary codes for indexed data must be recomputed to keep pace with updates to
the hash functions. We propose an efficient quality measure for hash functions,
based on an information-theoretic quantity, mutual information, and use it
successfully as a criterion to eliminate unnecessary hash table updates. Next,
we also show how to optimize the mutual information objective using stochastic
gradient descent. We thus develop a novel hashing method, MIHash, that can be
used in both online and batch settings. Experiments on image retrieval
benchmarks (including a 2.5M image dataset) confirm the effectiveness of our
formulation, both in reducing hash table recomputations and in learning
high-quality hash functions.Comment: International Conference on Computer Vision (ICCV), 201
Table Search Using a Deep Contextualized Language Model
Pretrained contextualized language models such as BERT have achieved
impressive results on various natural language processing benchmarks.
Benefiting from multiple pretraining tasks and large scale training corpora,
pretrained models can capture complex syntactic word relations. In this paper,
we use the deep contextualized language model BERT for the task of ad hoc table
retrieval. We investigate how to encode table content considering the table
structure and input length limit of BERT. We also propose an approach that
incorporates features from prior literature on table retrieval and jointly
trains them with BERT. In experiments on public datasets, we show that our best
approach can outperform the previous state-of-the-art method and BERT baselines
with a large margin under different evaluation metrics.Comment: Accepted at SIGIR 2020 (Long
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
The effect of WWW document structure on students' information retrieval
This experiment investigated the effect the structure of a WWW document has on the amount of information retained by a reader. Three structures common on the Internet were tested: one long page; a table of contents leading to individual sections; and short sections of text on separate pages with revision questions. Participants read information structured in one of these ways and were then tested on recall of that information. A further experiment investigated the effect that 'browsing' - moving between pages - has on retrieval. There was no difference between the structures for overall amount of information retained. The single page version was best for recall of facts, while the short sections of text with revision questions led to the most accurate inferences from the material. Browsing on its own had no significant impact on information retrieval. Revision questions rather than structure per se were therefore the key factor
TRECVID 2008 - goals, tasks, data, evaluation mechanisms and metrics
The TREC Video Retrieval Evaluation (TRECVID) 2008 is a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 7 years this effort has yielded a
better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. In 2008, 77 teams (see Table 1) from various research organizations --- 24 from
Asia, 39 from Europe, 13 from North America, and 1 from Australia --- participated in one or more of five tasks: high-level feature extraction, search (fully automatic, manually assisted, or interactive), pre-production video (rushes) summarization, copy detection, or surveillance event detection. The copy detection and surveillance event detection tasks are being run for the first time in TRECVID.
This paper presents an overview of TRECVid in 2008
- …
