14 research outputs found

    Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost

    Get PDF
    [Abstract] Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collections is time and resource consuming: it requires time to obtain the documents, to define the user needs and it requires the assessors to judge a lot of documents. To reduce the latest, pooling strategies aim to decrease the assessment effort by presenting to the assessors a sample of documents in the corpus with the maximum number of relevant documents in it. In this paper, we propose the preliminary design of different techniques to easily and cheapily build high-quality test collections without the need of having participants systems.Ministerio de Ciencia, Innovación y Universidades; RTI2018-093336-B-C22Xunta de Galicia; GPC ED431B 2019/03Xunta de Galicia; ED431G/0

    Plataforma para la etiquetación asistida de casos de riesgo temprano en internet

    Get PDF
    [Resumen] Desde la invención de la World Wide Web, la necesidad de los usuarios por buscar información en Internet no ha parado de aumentar. Esto ha provocado que crezcan de manera continua los sistemas de recuperación de información y los ámbitos donde esta disciplina tiene alguna aplicación. Por esto, es necesario elaborar metodologías y herramientas que permitan realizar una correcta evaluación de estos nuevos sistemas. El objetivo de este proyecto es diseñar y construir una plataforma que permita la etiquetación de documentos eficiente por parte de los asesores asociados a casos de trastornos psicológicos y mentales. Esta plataforma se usará para construir la colección de prueba en la competición de CLEF eRisk de 2020, que se celebra con el objetivo de evaluar la efectividad de metodologías y métricas para la detección temprana de casos de riesgo en Internet, especialmente aquellos relacionados con la salud, como la anorexia o la depresión. También se busca que la plataforma sea flexible a la hora de poder añadir nuevos modelos de recuperación o nuevas estrategias de pooling. Para poder lograr una correcta consecución de estos objetivos se ha decidido emplear una metodología ágil con ciclos iterativos e incrementales que permiten adaptarse a las circunstancias cambiantes del proyecto. Siguiendo este proceso se ha obtenido una aplicación de calidad que cumple con los objetivos establecidos.[Abstract] Since the invention of the World Wide Web, the need for users to search for information on the Internet has not stopped increasing. This has led to the continuous growth of information retrieval systems and the areas where this discipline has some application. For this reason, it is necessary to develop methodologies and tools that allow a correct evaluation of these new systems. The aim of this project is to design and build a platform that allows the efficient labeling of documents by asessors associated with cases of psychological and mental disorders. This platform will be used to build the test collection in the 2020 CLEF eRisk competition, which is held with the aim of evaluating the effectiveness of ethodologies and metrics for the early detection of risk cases on the Internet, especially those related to health, such as anorexia or depression. The platform is also intended to be flexible when adding new recovery models or new pooling strategies. In order to achieve these objectives correctly, it has been decided to use an agile methodology with iterative and incremental cycles that allow adaptation to the changing circumstances of the project. Following this process has resulted in a quality application that meets the established objectives.Traballo fin de grao (UDC.FIC). Enxeñaría informática. Curso 2018/201

    Building Cultural Heritage Reference Collections from Social Media through Pooling Strategies: The Case of 2020’s Tensions Over Race and Heritage

    Get PDF
    Preprint del artículo[Abstract] Social networks constitute a valuable source for documenting heritage constitution processes or obtaining a real-time snapshot of a cultural heritage research topic. Many heritage researchers use social networks as a social thermometer to study these processes, creating, for this purpose, collections that constitute born-digital archives potentially reusable, searchable, and of interest to other researchers or citizens. However, retrieval and archiving techniques used in social networks within heritage studies are still semi-manual, being a time-consuming task and hindering the reproducibility, evaluation, and open-up of the collections created. By combining Information Retrieval strategies with emerging archival techniques, some of these weaknesses can be left behind. Specifically, pooling is a well-known Information Retrieval method to extract a sample of documents from an entire document set (posts in case of social network's information), obtaining the most complete and unbiased set of relevant documents on a given topic. Using this approach, researchers could create a reference collection while avoiding annotating the entire corpus of documents or posts retrieved. This is especially useful in social media due to the large number of topics treated by the same user or in the same thread or post. We present a platform for applying pooling strategies combined with expert judgment to create cultural heritage reference collections from social networks in a customisable, reproducible, documented, and shareable way. The platform is validated by building a reference collection from a social network about the recent attacks on patrimonial entities motivated by anti-racist protests. This reference collection and the results obtained from its preliminary study are available for use. This real application has allowed us to validate the platform and the pooling strategies for creating reference collections in heritage studies from social networks.This research has received financial support from: (i) Saving European Archaeology from the Digital Dark Age (SEADDA) 2019-2023 COST ACTION CA 18128; (ii) “Ministerio de Ciencia, Innovación y Universidades” of the Government of Spain and the ERDF (projects RTI2018-093336-B-C21 and RTI2018-093336-B-C22); (iii) Xunta de Galicia - “Consellería de Cultura, Educación e Universidade” (project GPC ED431B 2019/03); (iv) Xunta de Galicia - “Consellería de Cultura, Educación e Universidade” and the ERDF (“Centro Singular de Investigación de Galicia” accreditation ED431G 2019/01)European Cooperation in Science and Technology; CA18128Xunta de Galicia; ED431B 2019/03Xunta de Galicia; ED431G 2019/0

    When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections

    Get PDF
    This is the peer reviewed version of the following article: David E. Losada, Javier Parapar and Alvaro Barreiro (2019) When to Stop Making Relevance Judgments? A Study of Stopping Methods for Building Information Retrieval Test Collections. Journal of the Association for Information Science and Technology, 70 (1), 49-60, which has been published in final form at https://doi.org/10.1002/asi.24077. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived VersionsIn information retrieval evaluation, pooling is a well‐known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very highThis work received financial support from the (i) “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the researchproject TIN2015-64282-R, (ii) Xunta de Galicia (project GPC 2016/035), and (iii) Xunta de Galicia “Consellería deCultura, Educación e Ordenación Universitaria” and theEuropean Regional Development Fund (ERDF) throughthe following 2016–2019 accreditations: ED431G/01(“Centro singular de investigación de Galicia”) andED431G/08S

    A day at the races : using best arm identification algorithms to reduce the cost of information retrieval user studies

    Get PDF
    Two major barriers to conducting user studies are the costs involved in recruiting participants and researcher time in performing studies. Typical solutions are to study convenience samples or design studies that can be deployed on crowd-sourcing platforms. Both solutions have benefits but also drawbacks. Even in cases where these approaches make sense, it is still reasonable to ask whether we are using our resources – participants’ and our time – efficiently and whether we can do better. Typically user studies compare randomly-assigned experimental conditions, such that a uniform number of opportunities are assigned to each condition. This sampling approach, as has been demonstrated in clinical trials, is sub-optimal. The goal of many Information Retrieval (IR) user studies is to determine which strategy (e.g., behaviour or system) performs the best. In such a setup, it is not wise to waste participant and researcher time and money on conditions that are obviously inferior. In this work we explore whether Best Arm Identification (BAI) algorithms provide a natural solution to this problem. BAI methods are a class of Multi-armed Bandits (MABs) where the only goal is to output a recommended arm and the algorithms are evaluated by the average payoff of the recommended arm. Using three datasets associated with previously published IR-related user studies and a series of simulations, we test the extent to which the cost required to run user studies can be reduced by employing BAI methods. Our results suggest that some BAI instances (racing algorithms) are promising devices to reduce the cost of user studies. One of the racing algorithms studied, Hoeffding, holds particular promise. This algorithm offered consistent savings across both the real and simulated data sets and only extremely rarely returned a result inconsistent with the result of the full trial. We believe the results can have an important impact on the way research is performed in this field. The results show that the conditions assigned to participants could be dynamically changed, automatically, to make efficient use of participant and experimenter time

    Visual Pool: A tool to visualize and interact with the pooling method

    Get PDF
    Every year more than 25 test collections are built among the main Information Retrieval (IR) evaluation campaigns. They are extremely important in IR because they become the evaluation praxis for the forthcoming years. Test collections are built mostly using the pooling method. The main advantage of this method is that it drastically reduces the number of documents to be judged. It does so at the cost of introducing biases, which are sometimes aggravated by non optimal configuration. In this paper we develop a novel visualization technique for the pooling method, and integrate it in a demo application named Visual Pool. This demo application enables the user to interact with the pooling method with ease, and develops visual hints in order to analyze existing test collections, and build better ones

    Building query-based relevance sets without human intervention

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophycollections are the standard framework used in the evaluation of an information retrieval system and the comparison between different systems. A text test collection consists of a set of documents, a set of topics, and a set of relevance assessments which is a list indicating the relevance of each document to each topic. Traditionally, forming the relevance assessments is done manually by human judges. But in large scale environments, such as the web, examining each document retrieved to determine its relevance is not possible. In the past there have been several studies that aimed to reduce the human effort required in building these assessments which are referred to as qrels (query-based relevance sets). Some research has also been done to completely automate the process of generating the qrels. In this thesis, we present different methodologies that lead to producing the qrels automatically without any human intervention. A first method is based on keyphrase (KP) extraction from documents presumed relevant; a second method uses Machine Learning classifiers, Naïve Bayes and Support Vector Machines. The experiments were conducted on the TREC-6, TREC-7 and TREC-8 test collections. The use of machine learning classifiers produced qrels resulting in information retrieval system rankings which were better correlated with those produced by TREC human assessments than any of the automatic techniques proposed in the literature. In order to produce a test collection which could discriminate between the best performing systems, an enhancement to the machine learning technique was made that used a small number of real or actual qrels as training sets for the classifiers. These actual relevant documents were selected by Losada et al.’s (2016) pooling technique. This modification led to an improvement in the overall system rankings and enabled discrimination between the best systems with only a little human effort. We also used the bpref-10 and infAP measures for evaluating the systems and comparing between the rankings, since they are more robust in incomplete judgment environments. We applied our new techniques to the French and Finnish test collections from CLEF2003 in order to confirm their reproducibility on non-English languages, and we achieved high correlations as seen for English

    Tren, metode, dan tantangan dalam penelitian di bidang sistem informasi: systematic literature review

    Get PDF
    Penelitian di bidang Sistem Informasi telah banyak dilakukan di seluruh dunia. Salah satu bentuk atau hasil dari sebuah penelitian adalah artikel ilmiah. Banyaknya artikel ilmiah di bidang Sistem Informasi membuat topik yang dibahas juga cukup beragam. Mengetahui tren metode yang digunakan dalam artikel ilmiah juga menjadi pembahasan yang menarik. Tentunya dari sekian banyak penelitian yang dilakukan dapat dipastikan ada berbagai macam tantangan yang dihadapi dalam pengerjaannya. Terkadang dalam penelitian juga menyisakan pertanyaan yang masih belum terjawab. Oleh karena itu, penelitian ini memiliki tujuan untuk mengetahui tren dari topik, metode, dan tantangan serta mengumpulkan open question yang belum terjawab pada penelitian bidang Sistem Informasi pada tahun 2017-2021. Analisis tren ini dapat dilakukan dengan menggunakanmetode systematic literature review dengan menggunakan ScienceDirect sebagai sumber data. Hasil penelitian menunjukkan tren topik penelitian di bidang Sistem Informasi adalah “Data/Information Management” dengan persentase sebesar 31.73% dari 104 artikel yang ditemukan. Tren metode yang terjadi adalah penggunaan metode “Survey/Interview” dengan persentase sebesar 22.37%. Tidak semua artikel menyebutkan tantangan yang mereka hadapi sehingga didapat 37 tantangan dari 104 artikel. Tantangan yang paling banyak dihadapi adalah mengenai human resource problem dengan persentase 21.62%. Tidak semua artikel penelitian menyisakan open question, sehingga hanya ditemukan 4 pertanyaan yang masih tersisa dari 104 artikel yang dianalisis
    corecore