40 research outputs found

    Report of MIRACLE team for the Ad-Hoc track in CLEF 2006

    Get PDF
    This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization

    MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

    Get PDF
    This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives

    Report of MIRACLE team for the Ad-Hoc track in CLEF 2007

    Get PDF
    This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks. No new approaches have been attempted in this campaign, following the procedures established in our participation in previous campaigns. For this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, Hungarian, and Czech. - Robust monolingual: French, English and Portuguese. There is still some room for improvement around multilingual named entities recognition

    Report of MIRACLE team for Geographical IR in CLEF 2006

    Full text link
    The main objective of the designed experiments is testing the effects of geographical information retrieval from documents that contain geographical tags. In the designed experiments we try to isolate geographical retrieval from textual retrieval replacing all geo-entity textual references from topics with associated tags and splitting the retrieval process in two phases: textual retrieval from the textual part of the topic without geo-entity references and geographical retrieval from the tagged text generated by the topic tagger. Textual and geographical results are combined applying different techniques: union, intersection, difference, and external join based. Our geographic information retrieval system consists of a set of basics components organized in two categories: (i) linguistic tools oriented to textual analysis and retrieval and (ii) resources and tools oriented to geographical analysis. These tools are combined to carry out the different phases of the system: (i) documents and topics analysis, (ii) relevant documents retrieval and (iii) result combination. If we compare the results achieved to the last campaign’s results, we can assert that mean average precision gets worse when the textual geo-entity references are replaced with geographical tags. Part of this worsening is due to our experiments return cero pertinent documents if no documents satisfy de geographical sub-query. But if we only analyze the results of queries that satisfied both textual and geographical terms, we observe that the designed experiments recover pertinent documents quickly, improving R-Precision values. We conclude that the developed geographical information retrieval system is very sensible to textual georeference and therefore it is necessary to improve the name entity recognition module

    DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

    Get PDF
    This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

    Challenges to evaluation of multilingual geographic information retrieval in GeoCLEF

    Get PDF
    This is the third year of the evaluation of geographic information retrieval (GeoCLEF) within the Cross-Language Evaluation Forum (CLEF). GeoCLEF 2006 presented topics and documents in four languages (English, German, Portuguese and Spanish). After two years of evaluation we are beginning to understand the challenges to both Geographic Information Retrieval from text and of evaluation of the results of geographic information retrieval. This poster enumerates some of these challenges to evaluation and comments on the limitations encountered in the first two evaluations

    MIRACLE-FI at ImageCLEFphoto 2008: Experiences in merging text-based and content-based retrievals

    Get PDF
    This paper describes the participation of the MIRACLE consortium at the ImageCLEF Photographic Retrieval task of ImageCLEF 2008. In this is new participation of the group, our first purpose is to evaluate our own tools for text-based retrieval and for content-based retrieval using different similarity metrics and the aggregation OWA operator to fuse the three topic images. From the MIRACLE last year experience, we implemented a new merging module combining the text-based and the content-based information in three different ways: FILTER-N, ENRICH and TEXT-FILTER. The former approaches try to improve the text-based baseline results using the content-based results lists. The last one was used to select the relevant images to the content-based module. No clustering strategies were analyzed. Finally, 41 runs were submitted: 1 for the text-based baseline, 10 content-based runs, and 30 mixed experiments merging text and content-based results. Results in general can be considered nearly acceptable comparing with the best results of other groups. Obtained results from textbased retrieval are better than content-based. Merging both textual and visual retrieval we improve the text-based baseline when applying the ENRICH merging algorithm although visual results are lower than textual ones. From these results we were going to try to improve merged results by clustering methods applied to this image collection

    Dynamisches Relevanz-Feedback im Patent-Retrievalsystem PatentAide

    Get PDF
    Im Patent Retrieval haben sich Rankingverfahren und Methoden wie Relevanz- Feedback noch nicht etabliert. An Ranking Systemen wird vor allem die mangelnde Transparenz für den Benutzer bemängelt. Das System PatentAide versucht, aufbauend auf einer Analyse der Rechercheprozesse im Patent Retrieval, ein Ranking-System zu implementieren. PatentAide unterstützt wichtige Techniken im Patent-Retrieval Prozess wie Term-Erweiterung, bietet ein geranktes Ergebnis und erlaubt darüber hinaus dynamisches Relevanz-Feedback

    Dynamisches Relevanz-Feedback im Patent-Retrievalsystem PatentAide

    Get PDF
    Im Patent Retrieval haben sich Rankingverfahren und Methoden wie Relevanz- Feedback noch nicht etabliert. An Ranking Systemen wird vor allem die mangelnde Transparenz für den Benutzer bemängelt. Das System PatentAide versucht, aufbauend auf einer Analyse der Rechercheprozesse im Patent Retrieval, ein Ranking-System zu implementieren. PatentAide unterstützt wichtige Techniken im Patent-Retrieval Prozess wie Term-Erweiterung, bietet ein geranktes Ergebnis und erlaubt darüber hinaus dynamisches Relevanz-Feedback
    corecore