22,854 research outputs found
The Reuters collection
This short paper presents the little known Reuters 22,173 test collection, which is significantly
larger than most traditional test collections. In addition, Reuters has none of the recall calculation
problems normally associated with some of the larger test collections now available. This paper
explains the method (derived from Lewis [Lewis 91]) used to perform retrieval experiments on the
Reuters collection. Then, to illustrate the use of Reuters, some simple retrieval experiments are
also presented that compare the performance of stemming algorithms
Machine Learning of User Profiles: Representational Issues
As more information becomes available electronically, tools for finding
information of interest to users becomes increasingly important. The goal of
the research described here is to build a system for generating comprehensible
user profiles that accurately capture user interest with minimum user
interaction. The research described here focuses on the importance of a
suitable generalization hierarchy and representation for learning profiles
which are predictively accurate and comprehensible. In our experiments we
evaluated both traditional features based on weighted term vectors as well as
subject features corresponding to categories which could be drawn from a
thesaurus. Our experiments, conducted in the context of a content-based
profiling system for on-line newspapers on the World Wide Web (the IDD News
Browser), demonstrate the importance of a generalization hierarchy and the
promise of combining natural language processing techniques with machine
learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Region-Based Image Retrieval Revisited
Region-based image retrieval (RBIR) technique is revisited. In early attempts
at RBIR in the late 90s, researchers found many ways to specify region-based
queries and spatial relationships; however, the way to characterize the
regions, such as by using color histograms, were very poor at that time. Here,
we revisit RBIR by incorporating semantic specification of objects and
intuitive specification of spatial relationships. Our contributions are the
following. First, to support multiple aspects of semantic object specification
(category, instance, and attribute), we propose a multitask CNN feature that
allows us to use deep learning technique and to jointly handle multi-aspect
object specification. Second, to help users specify spatial relationships among
objects in an intuitive way, we propose recommendation techniques of spatial
relationships. In particular, by mining the search results, a system can
recommend feasible spatial relationships among the objects. The system also can
recommend likely spatial relationships by assigned object category names based
on language prior. Moreover, object-level inverted indexing supports very fast
shortlist generation, and re-ranking based on spatial constraints provides
users with instant RBIR experiences.Comment: To appear in ACM Multimedia 2017 (Oral
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
- …