Search CORE

3,086 research outputs found

Brute Force Information Retrieval Experiments using MapReduce

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: European Research Consortium for Informatics and Mathematics
Publication date: 01/01/2012
Field of study

MIREX (MapReduce Information Retrieval Experiments) is a software library initially developed by the Database Group of the University of Twente for running large scale information retrieval experiments on clusters of machines. MIREX has been tested on web crawls of up to half a billion web pages, totalling about 12.5 TB of data uncompressed. MIREX shows that the execution of test queries by a brute force linear scan of pages, is a viable alternative to running the test queries on a search engine’s inverted index. MIREX is open source and available for others

Radboud Repository

University of Twente Research Information

MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: Springer
Publication date: 01/01/2010
Field of study

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net

CiteSeerX

Crossref

Radboud Repository

University of Twente Research Information

University of Twente @ TREC 2009: Indexing half a billion web pages

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2009
Field of study

This report presents results for the TREC 2009 adhoc task, the diversity task, and the relevance feedback task. We present ideas for unsupervised tuning of search system, an approach for spam removal, and the use of categories and query log information for diversifying search results

CiteSeerX

Radboud Repository

University of Twente Research Information

Utilizing scale-free networks to support the search for scientific publications

Author: Hauff C.
Nürnberger A.
Publication venue: Neslia Paniculata
Publication date: 01/01/2006
Field of study

When searching for scientiﬁc publications, users today often rely on search engines such as Yahoo.com. Whereas searching for publications whose titles are known is considered to be an easy task, users who are looking for important publications in research ﬁelds they are unfamiliar with face greater diffiulties since few or no indications of a publication’s importance to the respective fields are given. In this paper we investigate the application of the theory of scale-free networks to derive importance indicators for a collection of publications. A tool was developed to support the user in his publication search by visualizing the publications’ importance indicators derived from the number of citations received and the publication’s age as well as visualizing part of the citation network structure. A preliminary user study indicates the utility of our approach and warrants further research in that direction

University of Twente Research Information

MIREX: MapReduce Information Retrieval Experiments

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue
Publication date: 01/01/2010
Field of study

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost ma- chines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.ne

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Learning to Singulate Objects using a Push Proposal Network

Author: Burgard Wolfram
Eitel Andreas
Hauff Nico
Publication venue
Publication date: 05/02/2018
Field of study

Learning to act in unstructured environments, such as cluttered piles of objects, poses a substantial challenge for manipulation robots. We present a novel neural network-based approach that separates unknown objects in clutter by selecting favourable push actions. Our network is trained from data collected through autonomous interaction of a PR2 robot with randomly organized tabletop scenes. The model is designed to propose meaningful push actions based on over-segmented RGB-D images. We evaluate our approach by singulating up to 8 unknown objects in clutter. We demonstrate that our method enables the robot to perform the task with a high success rate and a low number of required push actions. Our results based on real-world experiments show that our network is able to generalize to novel objects of various sizes and shapes, as well as to arbitrary object configurations. Videos of our experiments can be viewed at http://robotpush.cs.uni-freiburg.deComment: International Symposium on Robotics Research (ISRR) 2017, videos: http://robotpush.cs.uni-freiburg.d

arXiv.org e-Print Archive

Crossref

The Effectiveness of Concept Based Search for Video Retrieval

Author: Aly Robin
Hauff Claudia
Hiemstra Djoerd
Publication venue: Gesellschaft fuer Informatik
Publication date: 01/01/2007
Field of study

In this paper we investigate how a small number of high-level concepts\ud derived for video shots, such as Sport. Face.Indoor. etc., can be used effectively for ad hoc search in video material. We will answer the following questions: 1) Can we automatically construct concept queries from ordinary text queries? 2) What is the best way to combine evidence from single concept detectors into final search results? We evaluated algorithms for automatic concept query formulation using WordNet based concept extraction, and we evaluated algorithms for fast, on-line combination of concepts. Experimental results on data from the TREC Video 2005 workshop and 25 test users show the following. 1) Automatic query formulation through WordNet based concept extraction can achieve comparable results to user created query concepts and 2) Combination methods that take neighboring shots into account outperform more simple combination methods

CiteSeerX

Radboud Repository

University of Twente Research Information

University of Twente at GeoCLEF 2006: geofiltered document retrieval

Author: Hauff Claudia
Rode Henning
Trieschnigg Dolf
Publication venue
Publication date: 01/01/2006
Field of study

In this report we describe the approach of the University of Twente to the 2006 Geo-CLEF task. It is based on retrieval by content and the subsequent filtering by geographical relevance utilizing a gazetteer. The results do not show an improvement inretrieval performance when taking geographical information into account

Crossref

University of Twente Research Information