7,831 research outputs found

    Keep It Simple Sheffield – a KISS approach to the Arabic track

    Get PDF
    Sheffield’s participation in the inaugural Arabic cross language track is described here. Our goal was to examine how well one could achieve retrieval of Arabic text with the minimum of resources and adaptation of existing retrieval systems. To this end the public translators used for query translation and the minimal changes to our retrieval system are described. While the effectiveness of our resulting system is not as high as one might desire, it nevertheless provides reasonable performance particularly in the monolingual track: on average, just under four relevant documents were found in the 10 top ranked documents

    Morphological variation of Arabic queries

    Get PDF
    Although it has been shown that in test collection based studies, stemming improves retrieval effectiveness in an information retrieval system, morphological variations of queries searching on the same topic are less well understood. This work examines the broad morphological variation that searchers of an Arabic retrieval system put into their queries. In this study, 15 native Arabic speakers were asked to generate queries, morphological variants of query words were collated across users. Queries composed of either the commonest or rarest variants of each word were submitted to a retrieval system and the effectiveness of the searches was measured. It was found that queries composed of the more popular morphological variants were more likely to retrieve relevant documents that those composed of less popular

    An analysis of machine translation errors on the effectiveness of an Arabic-English QA system

    Get PDF
    The aim of this paper is to investigate how much the effectiveness of a Question Answering (QA) system was affected by the performance of Machine Translation (MT) based question translation. Nearly 200 questions were selected from TREC QA tracks and ran through a question answering system. It was able to answer 42.6% of the questions correctly in a monolingual run. These questions were then translated manually from English into Arabic and back into English using an MT system, and then re-applied to the QA system. The system was able to answer 10.2% of the translated questions. An analysis of what sort of translation error affected which questions was conducted, concluding that factoid type questions are less prone to translation error than others

    The effects of topic familiarity on user search behavior in question answering systems

    Get PDF
    This paper reports on experiments that attempt to characterize the relationship between users and their knowledge of the search topic in a Question Answering (QA) system. It also investigates user search behavior with respect to the length of answers presented by a QA system. Two lengths of answers were compared; snippets (one to two sentences of text) and exact answers. A user test was conducted, 92 factoid questions were judged by 44 participants, to explore the participants’ preferences, feelings and opinions about QA system tasks. The conclusions drawn from the results were that participants preferred and obtained higher accuracy in finding answers from the snippets set. However, accuracy varied according to users’ topic familiarity; users were only substantially helped by the wider context of a snippet if they were already familiar with the topic of the question, without such familiarity, users were about as accurate at locating answers from the snippets as they were in exact set

    The exotic invasive plant Vincetoxicum rossicum is a strong competitor even outside its current realized climatic temperature range

    Get PDF
    Dog-strangling vine (Vincetoxicum rossicum) is an exotic plant originating from Central and Eastern Europe that is becoming increasingly invasive in southern Ontario, Canada. Once established, it successfully displaces local native plant species but mechanisms behind this plant’s high competitive ability are not fully understood. It is unknown whether cooler temperatures will limit the range expansion of V. rossicum, which has demonstrated high tolerance for other environmental variables such as light and soil moisture. Furthermore, if V. rossicum can establish outside its current climatic limit it is unknown whether competition with native species can significantly contribute to reduce fitness and slow down invasion. We conducted an experiment to test the potential of V. rossicum to spread into northern areas of Ontario using a set of growth chambers to simulate southern and northern Ontario climatic temperature regimes. We also tested plant-plant competition by growing V. rossicum in pots with a highly abundant native species, Solidago canadensis, and comparing growth responses to plants grown alone. We found that the fitness of V. rossicum was not affected by the cooler climate despite a delay in reproductive phenology. Growing V. rossicum with S. canadensis caused a significant reduction in seedpod biomass of V. rossicum. However, we did not detect a temperature x competition interaction in spite of evidence for adaptation of S. canadensis to cooler temperature conditions. We conclude that the spread of V. rossicum north within the tested range is unlikely to be limited by climatic temperature but competition with an abundant native species may contribute to slow it down

    Information extraction from template-generated hidden web documents

    Get PDF
    The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approach that identifies Web page templates in order to extract queryrelated information from documents. We propose two forms of representation to analyse the content of a document – Text with Immediate Adjacent Tag Segments (TIATS) and Text with Neighbouring Adjacent Tag Segments (TNATS). Our techniques exploit tag structures that surround the textual contents of documents in order to detect Web page templates thereby extracting query-related information. Experimental results demonstrate that TNATS detects Web page templates most effectively and extracts information with high recall and precision

    Query-related data extraction of hidden web documents

    Get PDF
    The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases — which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision

    Relevance Judgments between TREC and Non-TREC Assessors

    Get PDF
    This paper investigates the agreement of relevance assessments between official TREC judgments and those generated from an interactive IR experiment. Results show that 63% of documents judged relevant by our users matched official TREC judgments. Several factors contributed to differences in the agreements: the number of retrieved relevant documents; the number of relevant documents judged; system effectiveness per topic and the ranking of relevant documents

    The Relationship between IR Effectiveness Measures and User Satisfaction

    Get PDF
    This paper presents an experimental study of users assessing the quality of Google web search results. In particular we look at how users' satisfaction correlates with the effectiveness of Google as quantified by IR measures such as precision and the suite of Cumulative Gain measures (CG, DCG, NDCG). Results indicate strong correlation between users' satisfaction, CG and precision, moderate correlation with DCG, with perhaps surprisingly negligible correlation with NDCG. The reasons for the low correlation with NDCG are examined

    Users' effectiveness and satisfaction for image retrieval

    Get PDF
    This paper presents results from an initial user study exploring the relationship between system effectiveness as quantified by traditional measures such as precision and recall, and users’ effectiveness and satisfaction of the results. The tasks involve finding images for recall-based tasks. It was concluded that no direct relationship between system effectiveness and users’ performance could be proven (as shown by previous research). People learn to adapt to a system regardless of its effectiveness. This study recommends that a combination of attributes (e.g. system effectiveness, user performance and satisfaction) is a more effective way to evaluate interactive retrieval systems. Results of this study also reveal that users are more concerned with accuracy than coverage of the search results
    • …
    corecore