117 research outputs found

    Report of MIRACLE team for the Ad-Hoc track in CLEF 2007

    Get PDF
    This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks. No new approaches have been attempted in this campaign, following the procedures established in our participation in previous campaigns. For this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, Hungarian, and Czech. - Robust monolingual: French, English and Portuguese. There is still some room for improvement around multilingual named entities recognition

    Assessed Relevance and Stylistic Variation

    Get PDF
    Texts exhibit considerable stylistic variation. This paper reports an experiment where a large corpus of documents is analyzed using various simple stylistic metrics. A subset of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied

    Report of MIRACLE team for Geographical IR in CLEF 2006

    Full text link
    The main objective of the designed experiments is testing the effects of geographical information retrieval from documents that contain geographical tags. In the designed experiments we try to isolate geographical retrieval from textual retrieval replacing all geo-entity textual references from topics with associated tags and splitting the retrieval process in two phases: textual retrieval from the textual part of the topic without geo-entity references and geographical retrieval from the tagged text generated by the topic tagger. Textual and geographical results are combined applying different techniques: union, intersection, difference, and external join based. Our geographic information retrieval system consists of a set of basics components organized in two categories: (i) linguistic tools oriented to textual analysis and retrieval and (ii) resources and tools oriented to geographical analysis. These tools are combined to carry out the different phases of the system: (i) documents and topics analysis, (ii) relevant documents retrieval and (iii) result combination. If we compare the results achieved to the last campaign’s results, we can assert that mean average precision gets worse when the textual geo-entity references are replaced with geographical tags. Part of this worsening is due to our experiments return cero pertinent documents if no documents satisfy de geographical sub-query. But if we only analyze the results of queries that satisfied both textual and geographical terms, we observe that the designed experiments recover pertinent documents quickly, improving R-Precision values. We conclude that the developed geographical information retrieval system is very sensible to textual georeference and therefore it is necessary to improve the name entity recognition module

    DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

    Get PDF
    This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

    Stylistic Variation in an Information Retrieval Experiment

    Full text link
    Texts exhibit considerable stylistic variation. This paper reports an experiment where a corpus of documents (N= 75 000) is analyzed using various simple stylistic metrics. A subset (n = 1000) of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-

    Miracle’s 2005 Approach to Monolingual Information Retrieval

    Full text link
    This paper presents the 2005 Miracle’s team approach to Monolingual Information Retrieval. The goal for the experiments in this year was twofold: continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extracting, paragraph extracting, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query

    Unités d'indexation et taille des requêtes pour la recherche d'information en français

    Get PDF
    International audienceThis paper analyses different indexing method for French (lemmas, stems and truncated terms) as well as their fusing. We also examine the influence of the different section of a topic on precision. Our study uses the collections from CLEF – French monolingual from 2000 to 2005. We show that the best method is the one based on lemmas and that fuse the results obtained with the different sections of a topic.MOTS-CLÉS :recherche d'information, fusion, indexation, influence de l'indexation, recherche d'information en français.Dans cet article, nous nous intéressons à la recherche d'information en Français. Nous analysons différentes techniques d'indexation (basées sur des lemmes, des radicaux ou des termes) et leur fusion. Nous analysons également l'influence de la prise en compte des différentes parties d'une requête. Notre étude porte sur 6 campagnes d'évaluation de CLEF Français. Nous montrons que l'utilisation des lemmes et la combinaison des différentes variantes d'une requête sont les plus efficaces pour améliorer la précision moyenne et la haute précisio

    Towards evaluation of personalized and collaborative information retrieval

    Get PDF
    We propose to extend standard information retrieval (IR) ad-hoc test collection design to facilitate research on personalized and collaborative IR by gathering additional meta-information during the topic (query) development process. We propose a controlled query generation process with activity logging for each topic developer. The standard ad-hoc collection will thus be accompanied by a new set of thematically related topics and the associated log information, and has the potential to simulate a real-world search scenario to encourage retrieval systems to mine user information from the logs to improve IR effectiveness. The proposed methodology described in this paper will be applied in a pilot task which is scheduled to run in the FIRE 2011 evaluation campaign. The task aims at investigating the research question of whether personalized and collaborative IR retrieval experiments and evaluation can be pursued by enriching a standard ad-hoc collection with such meta-information

    Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

    Full text link
    While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short

    Machine Learning of User Profiles: Representational Issues

    Full text link
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
    corecore