625 research outputs found
When to cross Over? Cross-language linking using Wikipedia for VideoCLEF 2009
We describe Dublin City University (DCU)'s participation in the VideoCLEF 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches rst extracted a search query from the transcriptions of the Dutch TV broadcasts. One method rst performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method rst translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the
Dutch Wikipedia yielded better results
ICT-DCU question answering task at NTCIR-6
This paper describes details of our participation in the
NTCIR-6 Chinese-to-Chinese Question Answering task. We
use the āretrieval plus extraction approachā to get answers
for questions. We first split the documents into short passages, and then retrieve potentially relevant passages for a question, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which cover most questions. The Lemur toolkit was used with the okapi model for document retrieval. Results of our task submission are given and some preliminary conclusions drawn
MIREX: MapReduce Information Retrieval Experiments
We propose to use MapReduce to quickly test new retrieval approaches on a
cluster of machines by sequentially scanning all documents. We present a small
case study in which we use a cluster of 15 low cost ma- chines to search a web
crawl of 0.5 billion pages showing that sequential scanning is a viable
approach to running large-scale information retrieval experiments with little
effort. The code is available to other researchers at:
http://mirex.sourceforge.ne
MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net
LCC-DCU C-C question answering task at NTCIR-5
This paper describes the work for our participation in the NTCIR-5 Chinese to Chinese Question Answering task. Our strategy is based on the āRetrieval plus Extractionā approach. We first retrieve relevant documents, then retrieve short passages from the above documents, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which can cover most questions. The Lemur toolkit with the OKAPI model is used for document retrieval. Results of our task submission are given and some preliminary conclusions drawn
Building a domain-specific document collection for evaluating metadata effects on information retrieval
This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results
surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description
University of Strathclyde at TREC HARD
The motivation behind the University of Strathclyde's approach to this years HARD track was inspired from previous experiences by other participants, in particular research by [1], [3] and [4]. A running theme throughout these papers was the underlying hypothesis that a user's familiarity in a topic (i.e. their previous experience searching a subject), will form the basis for what type or style of document they will perceive as relevant. In other words, the user's context with regards to their previous search experience will determine what type of document(s) they wish to retrieve
Finding Relevant Answers in Software Forums
AbstractāOnline software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin
Throughput analysis for a high-performance FPGA-accelerated real-time search application
We propose an FPGA design for the relevancy computation part of a high-throughput real-time search application. The application matches terms in a stream of documents against a static profile, held in off-chip memory. We present a mathematical analysis of the throughput of the application and apply it to the problem of scaling the Bloom filter used to discard nonmatches
- ā¦