77,369 research outputs found
MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information
This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives
Building simulated queries for known-item topics: an analysis using six european languages
There has been increased interest in the use of simulated queries for evaluation and estimation purposes in Information Retrieval. However, there are still many unaddressed issues regarding their usage and impact on evaluation because their quality, in terms of retrieval performance, is unlike real queries. In this paper, we focus on methods for building simulated known-item topics and explore their quality against real known-item topics. Using existing generation models as our starting point, we explore factors which may influence the generation of the known-item topic. Informed by this detailed analysis (on six European languages) we propose a model with improved document and term selection properties, showing that simulated known-item topics can be generated that are comparable to real known-item topics. This is a significant step towards validating the potential usefulness of simulated queries: for evaluation purposes, and because building models of querying behavior provides a deeper insight into the querying process so that better retrieval mechanisms can be developed to support the user
The NASA Astrophysics Data System: The Search Engine and its User Interface
The ADS Abstract and Article Services provide access to the astronomical
literature through the World Wide Web (WWW). The forms based user interface
provides access to sophisticated searching capabilities that allow our users to
find references in the fields of Astronomy, Physics/Geophysics, and
astronomical Instrumentation and Engineering. The returned information includes
links to other on-line information sources, creating an extensive astronomical
digital library. Other interfaces to the ADS databases provide direct access to
the ADS data to allow developers of other data systems to integrate our data
into their system.
The search engine is a custom-built software system that is specifically
tailored to search astronomical references. It includes an extensive synonym
list that contains discipline specific knowledge about search term
equivalences.
Search request logs show the usage pattern of the various search system
capabilities. Access logs show the world-wide distribution of ADS users.
The ADS can be accessed at http://adswww.harvard.eduComment: 23 pages, 18 figures, 11 table
Query expansion with naive bayes for searching distributed collections
The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results
- …