289 research outputs found

    Analyse des paramètres de recherche d'information: Etude de l'influence des paramètres sur les résultats

    Get PDF
    International audienceCet article présente une analyse détaillée d'un ensemble de 2 millions de résultats de recherche d'information obtenus par différents paramétrages de systèmes de recherche d'information. Plus spécifiquement, nous avons utilisé la plateforme Terrier et l'interface RunGeneration pour créer différentes exécutions (run en anglais) en modifiant les modèles d'indexation et de recherche. Nous avons ensuite évalué chacun des résultats obtenus selon différentes mesures de performance de recherche d'information. Une analyse systématique a été menée sur ces données afin de déterminer d'une part quels étaient les paramètres qui ont le plus d'influence, d'autre part quels étaient les valeurs de ces paramètres les plus susceptibles de conduire à de bonnes performances du système

    Epistemic vs non-epistemic criteria to assess Wikipedia articles: evolution of young people perceptions

    Get PDF
    International audienceThis paper tackles the problem of information credibility assessment by users, focusing on Wikipedia articles. We consider both epistemic and non-epistemic criteria. We conducted a study using a questionnaire where 841 French young people aged from 11 to 25 years participated and we analysed the results considering the level of education as a variable. We found that the higher the level of education is, the more young people mention epistemic credibility criteria and the less they indicate non-epistemic criteria. We draw some recommendation for information literacy

    IRIT at INEX 2014 : Tweet Contextualization Track

    Get PDF
    National audienceThe paper presents IRIT's approach used at INEX Tweet Contextualization Track 2014. Systems had to provide a context to a tweet from the perspective of the entity. This year we further modified our approach presented at INEX 2011, 2012 and 2013 underlain by the product of different measures based on smoothing from local context, named entity recognition, part-ofspeech weighting and sentence quality analysis. We introduced two ways to link an entity and a tweet, namely (1) concatenation of the entity and the tweet and (2) usage of the results obtained for the entity as a restriction to filter results retrieved for the tweet. Besides, we examined the influence of topic-comment relationship on contextualization

    Unités d'indexation et taille des requêtes pour la recherche d'information en français

    Get PDF
    International audienceThis paper analyses different indexing method for French (lemmas, stems and truncated terms) as well as their fusing. We also examine the influence of the different section of a topic on precision. Our study uses the collections from CLEF – French monolingual from 2000 to 2005. We show that the best method is the one based on lemmas and that fuse the results obtained with the different sections of a topic.MOTS-CLÉS :recherche d'information, fusion, indexation, influence de l'indexation, recherche d'information en français.Dans cet article, nous nous intéressons à la recherche d'information en Français. Nous analysons différentes techniques d'indexation (basées sur des lemmes, des radicaux ou des termes) et leur fusion. Nous analysons également l'influence de la prise en compte des différentes parties d'une requête. Notre étude porte sur 6 campagnes d'évaluation de CLEF Français. Nous montrons que l'utilisation des lemmes et la combinaison des différentes variantes d'une requête sont les plus efficaces pour améliorer la précision moyenne et la haute précisio

    Linguistic Analysis of Users' Queries: towards an adaptive Information Retrieval System

    Get PDF
    International audienceMost of Information Retrieval Systems transform natural language users'queries into bags of words that are matched to documents also represented as bags of words. Through such process, the richness of the query is lost. In this paper we show that linguistic features of a query are good indicators to predict systems failure to answer it. The experiments are based on 42 systems or system variants and 50 TREC topics that consist of a descriptive part expressed in natural language

    An energy-based model to optimize cluster visualization

    Get PDF
    National audienceGraphs are mathematical structures that provide natural means for complex-data representation. Graphs capture the structure and thus help modeling a wide range of complex real-life data in various domains. Moreover graphs are especially suitable for information visualization. Indeed the intuitive visualabstraction (dots and lines) they provide is intimately associated with graphs. Visualization paves the way to interactive exploratory data-analysis and to important goals such as identifying groups and subgroups among data and helping to understand how these groups interact with each other. In this paper, we present a graph drawing approach that helps to better appreciate the cluster structure in data and the interactions that may exist between clusters. In this work, we assume that the clusters are already extracted and focus rather on the visualization aspects. We propose an energy-based model for graph drawing that produces an esthetic drawing that ensures each cluster will occupy a separate zone within thevisualization layout. This method emphasizes the inter-groups interactions and still shows the inter-nodes interactions. The drawing areas assigned to the clusters can be user-specified (prefixed areas) or automatically crafted (free areas). The approach we suggest also enables handling geographically-based clustering. In the case of free areas, we illustrate the use of our drawing method through an example. In the case of prefixed areas, we firstuse an example from citation networks and then use another exampleto compare the results of our method to those of the divide and conquer approach. In the latter case, we show that while the two methods successfully point out the cluster structure our method better visualize the global structure

    Linguistic features to predict query difficulty: a case study on previous TREC campaigns

    Get PDF
    International audienceQuery difficulty can be linked to a number of causes. Some of these causes can be related to the query expression itself, and can therefore be detected through a linguistic analysis of the query text. Using 16 different linguistic features, automatically computed on TREC queries, we looked for significant correlations between these features and the average recall and precision scores obtained by systems. Three of these features are shown to have a significant impact on either recall or precision scores for previous adhoc TREC campaigns. Each of these features can be viewed as a clue to a linguistically-specific characteristic, either morphological, syntactical or semantic. These results also open the way for a more enlightened use of linguistic processing in IR systems

    Trustworthy Personal Assistance: A Design Objective for Interactive Agents

    Get PDF

    How trust in Wikipedia evolves: a survey of students aged 11 to 25

    Get PDF
    Introduction. Whether Wikipedia is to be considered a trusted source is frequently questioned in France. This paper reports the results of a survey examining the levels of trust shown by young people aged 11 to 25. Method. We analyse the answers given by 841 young people, aged 11 to 25, to a questionnaire. To our knowledge, this is the largest study ever published on the topic. It focuses on (1) the perception young people have of Wikipedia; (2) the influence teachers and peers have on the young person's own opinions, and (3) the variation of trends according to the education level. Analysis. All the analyses are based on ANOVA to compare the various groups of participants. We detail the results by comparing the various groups of responders and discuss these results in relation to previous studies. Results. Trust in Wikipedia depends on the type of information seeking tasks and on the education level. There are contrasting social judgments of Wikipedia. Students build a representation of a teacher's expectations on the nature of the sources that they can use and hence the documentary acceptability of Wikipedia. The average trust attributed to Wikipedia for academic tasks could be induced by the tension between the negative academic reputation of the encyclopaedia and the mostly positive experience of its credibility. Conclusion. Our survey demonstrates significant differences between the levels of education, both for Wikipedia use and its representation. This variable should be included in studies related to information behaviour by the young to avoid generalizations that deny the disparities between the ages

    Selective Query Processing: a Risk-Sensitive Selection of System Configurations

    Full text link
    In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches and these optimized parameters are then used as the system configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible system configurations. This approach was extended recently to include many other parameters, leading to many possible system configurations where the system automatically selects the best configuration on a per-query basis. To determine the ideal configurations to use on a per-query basis in real-world systems we developed a method in which a restricted number of possible configurations is pre-selected and then used in a meta-search engine that decides the best search configuration on a per query basis. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward trade-off between the number of configurations kept, and system effectiveness. For final configuration selection, the decision is based on query feature similarities. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to increase effectiveness by about 15% according(P@10, nDCG@10) when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc-oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.Comment: 30 pages, 5 figures, 8 tables; submitted to TOIS ACM journa
    • …
    corecore