Search CORE

13 research outputs found

Building Cultural Heritage Reference Collections from Social Media through Pooling Strategies: The Case of 2020’s Tensions Over Race and Heritage

Author: Martín-Rodilla Patricia
Otero David
Parapar Javier
Publication venue
Publication date: 01/01/2021
Field of study

Preprint del artículo[Abstract] Social networks constitute a valuable source for documenting heritage constitution processes or obtaining a real-time snapshot of a cultural heritage research topic. Many heritage researchers use social networks as a social thermometer to study these processes, creating, for this purpose, collections that constitute born-digital archives potentially reusable, searchable, and of interest to other researchers or citizens. However, retrieval and archiving techniques used in social networks within heritage studies are still semi-manual, being a time-consuming task and hindering the reproducibility, evaluation, and open-up of the collections created. By combining Information Retrieval strategies with emerging archival techniques, some of these weaknesses can be left behind. Specifically, pooling is a well-known Information Retrieval method to extract a sample of documents from an entire document set (posts in case of social network's information), obtaining the most complete and unbiased set of relevant documents on a given topic. Using this approach, researchers could create a reference collection while avoiding annotating the entire corpus of documents or posts retrieved. This is especially useful in social media due to the large number of topics treated by the same user or in the same thread or post. We present a platform for applying pooling strategies combined with expert judgment to create cultural heritage reference collections from social networks in a customisable, reproducible, documented, and shareable way. The platform is validated by building a reference collection from a social network about the recent attacks on patrimonial entities motivated by anti-racist protests. This reference collection and the results obtained from its preliminary study are available for use. This real application has allowed us to validate the platform and the pooling strategies for creating reference collections in heritage studies from social networks.This research has received financial support from: (i) Saving European Archaeology from the Digital Dark Age (SEADDA) 2019-2023 COST ACTION CA 18128; (ii) “Ministerio de Ciencia, Innovación y Universidades” of the Government of Spain and the ERDF (projects RTI2018-093336-B-C21 and RTI2018-093336-B-C22); (iii) Xunta de Galicia - “Consellería de Cultura, Educación e Universidade” (project GPC ED431B 2019/03); (iv) Xunta de Galicia - “Consellería de Cultura, Educación e Universidade” and the ERDF (“Centro Singular de Investigación de Galicia” accreditation ED431G 2019/01)European Cooperation in Science and Technology; CA18128Xunta de Galicia; ED431B 2019/03Xunta de Galicia; ED431G 2019/0

Repositorio da Universidade da Coruña

When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections

Author: Barreiro Álvaro
Losada Carril David Enrique
Parapar Javier
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

This is the peer reviewed version of the following article: David E. Losada, Javier Parapar and Alvaro Barreiro (2019) When to Stop Making Relevance Judgments? A Study of Stopping Methods for Building Information Retrieval Test Collections. Journal of the Association for Information Science and Technology, 70 (1), 49-60, which has been published in final form at https://doi.org/10.1002/asi.24077. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived VersionsIn information retrieval evaluation, pooling is a well‐known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very highThis work received ﬁnancial support from the (i) “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the researchproject TIN2015-64282-R, (ii) Xunta de Galicia (project GPC 2016/035), and (iii) Xunta de Galicia “Consellería deCultura, Educación e Ordenación Universitaria” and theEuropean Regional Development Fund (ERDF) throughthe following 2016–2019 accreditations: ED431G/01(“Centro singular de investigación de Galicia”) andED431G/08S

Repositorio Institucional da Universidade de Santiago de Compostela

Progettazione e realizzazione di un'applicazione per la raccolta e il campionamento di pagine web

Author
Publication venue
Publication date
Field of study

I primi sistemi di Information Retrieval lavoravano su collezioni di qualità omogenea come documenti giuridici e articoli medici. Con l’avvento del web, le tecniche tradizionali di reperimento dell’informazione sono risultate poco efficaci in quanto incapaci di distinguere la qualità dei documenti; di qui la necessità di ideare algoritmi in grado di selezionare le pagine web in base sia alla rilevanza che alla qualità. Tra questi algoritmi, un posto di rilievo hanno assunto quelli di link analysis, che cercano di inferire la qualità delle pagine web dalla struttura topologica del grafo associato al web. Il lavoro descritto in questa relazione è stato svolto all’interno di un progetto che ha lo scopo di valutare l’effettiva efficacia di tali algoritmi. Il nostro lavoro è consistito nello sviluppo di un’applicazione web che, data un’opportuna popolazione di pagine web, metterà a disposizione una serie di funzionalità mirate alla raccolta di giudizi sulla qualità delle pagine stesse. Il software citato esegue una pre-elaborazione dei risultati restituiti dai motori di ricerca e a tal proposito sono stati sviluppati tre moduli: Interrogatore, che si preoccuperà di estrapolare gli URL dai risultati; Campionatore che, data una teoria euristica ragionevole, filtrerà i risultati restituiti dall’Interrogatore e infine Downloader che si occuperà di memorizzare le pagine su disc

Padua Thesis and Dissertation Archive

Recommended from our members

A collaborative approach to IR evaluation

Author: Sheshadri Aashish
Publication venue
Publication date: 16/09/2014
Field of study

textIn this thesis we investigate two main problems: 1) inferring consensus from disparate inputs to improve quality of crowd contributed data; and 2) developing a reliable crowd-aided IR evaluation framework. With regard to the first contribution, while many statistical label aggregation methods have been proposed, little comparative benchmarking has occurred in the community making it difficult to determine the state-of-the-art in consensus or to quantify novelty and progress, leaving modern systems to adopt simple control strategies. To aid the progress of statistical consensus and make state-of-the-art methods accessible, we develop a benchmarking framework in SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. Through the development of SQUARE we propose a crowd simulation model that emulates real crowd environments to enable rapid and reliable experimentation of collaborative methods with different crowd contributions. We apply the findings of the benchmark to develop reliable crowd contributed test collections for IR evaluation. As our second contribution, we describe a collaborative model for distributing relevance judging tasks between trusted assessors and crowd judges. Based on prior work's hypothesis of judging disagreements on borderline documents, we train a logistic regression model to predict assessor disagreement, prioritizing judging tasks by expected disagreement. Judgments are generated from different crowd models and intelligently aggregated. Given a priority queue, a judging budget, and a ratio for expert vs. crowd judging costs, critical judging tasks are assigned to trusted assessors with the crowd supplying remaining judgments. Results on two TREC datasets show significant judging burden can be confidently shifted to the crowd, achieving high rank correlation and often at lower cost vs. exclusive use of trusted assessors.Computer Science

Texas ScholarWorks

Effective collection construction for information retrieval evaluation and optimization

Author: Li D.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Stopping methods for technology assisted reviews based on point processes

Author: Bin-Hezam R.
Stevenson M.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 11/11/2023
Field of study

Technology Assisted Review (TAR), which aims to reduce the effort required to screen collections of documents for relevance, is used to develop systematic reviews of medical evidence and identify documents that must be disclosed in response to legal proceedings. Stopping methods are algorithms which determine when to stop screening documents during the TAR process, helping to ensure that workload is minimised while still achieving a high level of recall. This paper proposes a novel stopping method based on point processes, which are statistical models that can be used to represent the occurrence of random events. The approach uses rate functions to model the occurrence of relevant documents in the ranking and compares four candidates, including one that has not previously been used for this purpose (hyperbolic). Evaluation is carried out using standard datasets (CLEF e-Health, TREC Total Recall, TREC Legal), and this work is the first to explore stopping method robustness by reporting performance on a range of rankings of varying effectiveness. Results show that the proposed method achieves the desired level of recall without requiring an excessive number of documents to be examined in the majority of cases and also compares well against multiple alternative approaches

arXiv.org e-Print Archive

White Rose Research Online

On-line Metasearch, Pooling, and System Evaluation

Author: Savell Robert A
Publication venue: Dartmouth Digital Commons
Publication date: 02/06/2005
Field of study

This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts

Dartmouth Digital Commons (Dartmouth College)

Department of Computer Science Activity 1998-2004

Author: Kotz David
Publication venue: Dartmouth Digital Commons
Publication date: 20/03/2005
Field of study

This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

Dartmouth Digital Commons (Dartmouth College)

Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility

Author: Baruah Gaurav Makhon
Publication venue: 'University of Waterloo'
Publication date: 02/08/2016
Field of study

Events like hurricanes, earthquakes, or accidents can impact a large number of people. Not only are people in the immediate vicinity of the event affected, but concerns about their well-being are shared by the local government and well-wishers across the world. The latest information about news events could be of use to government and aid agencies in order to make informed decisions on providing necessary support, security and relief. The general public avails of news updates via dedicated news feeds or broadcasts, and lately, via social media services like Facebook or Twitter. Retrieving the latest information about newsworthy events from the world-wide web is thus of importance to a large section of society. As new content on a multitude of topics is continuously being published on the web, specific event related information needs to be filtered from the resulting stream of documents. We present in this thesis, a user-centric evaluation measure for evaluating systems that filter news related information from document streams. Our proposed evaluation measure, Modeled Stream Utility (MSU), models users accessing information from a stream of sentences produced by a news update filtering system. The user model allows for simulating a large number of users with different characteristic stream browsing behavior. Through simulation, MSU estimates the utility of a system for an average user browsing a stream of sentences. Our results show that system performance is sensitive to a user population's stream browsing behavior and that existing evaluation metrics correspond to very specific types of user behavior. To evaluate systems that filter sentences from a document stream, we need a set of judged sentences. This judged set is a subset of all the sentences returned by all systems, and is typically constructed by pooling together the highest quality sentences, as determined by respective system assigned scores for each sentence. Sentences in the pool are manually assessed and the resulting set of judged sentences is then used to compute system performance metrics. In this thesis, we investigate the effect of including duplicates of judged sentences, into the judged set, on system performance evaluation. We also develop an alternative pooling methodology, that given the MSU user model, selects sentences for pooling based on the probability of a sentences being read by modeled users. Our research lays the foundation for interesting future work for utilizing user-models in different aspects of evaluation of stream filtering systems. The MSU measure enables incorporation of different user models. Furthermore, the applicability of MSU could be extended through calibration based on user behavior

University of Waterloo's Institutional Repository

Evaluation Methodologies for Visual Information Retrieval and Annotation

Author: Nowak Stefanie
Publication venue
Publication date: 09/03/2012
Field of study

Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt Performanz und Qualität der Informationsgewinnung zu bewerten. Bereits in den 60er Jahren wurden erste Methodologien für die system-basierte Evaluation aufgestellt und in den Cranfield Experimenten überprüft. Heutzutage gehören Evaluation, Test und Qualitätsbewertung zu einem aktiven Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten Methoden. Evaluationsmethoden fanden zunächst in der Bewertung von Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von Multimediaanalyse-Systeme übertragen. Dies geschah häufig, ohne die Evaluationsmethoden in Frage zu stellen oder sie an die veränderten Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschäftigt sich mit der system-basierten Evaluation von Indizierungssystemen für Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von Annotationen: Nutzeranforderungen für das Suchen und Verschlagworten von Bildern, Evaluationsmaße für die Qualitätsbewertung von Indizierungssystemen und Anforderungen an die Erstellung visueller Testkollektionen. Am Beispiel der Evaluation automatisierter Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer zuverlässigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt und Evaluationsmaße zur Qualitätsbewertung eingeführt, analysiert und experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben üblicherweise binäre Kosten für korrekte und falsche Annotationen. Diese Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin überprüft werden. In dieser Arbeit wird aufgezeigt, wie semantische Ähnlichkeiten visueller Konzepte automatisiert abgeschätzt und in den Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit inkludieren ein Nutzermodell für die konzeptbasierte Suche von Bildern, eine vollständig bewertete Testkollektion und neue Evaluationsmaße für die anforderungsgerechte Qualitätsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information Retrieval (IR) systems. Starting with the Cranfield experiments in the early 60ies, methodologies for the system-based performance assessment emerged and established themselves, resulting in an active research field with a number of successful benchmarking activities. With the rise of the digital age, procedures of text retrieval evaluation were often transferred to multimedia retrieval evaluation without questioning their direct applicability. This thesis investigates the problem of system-based performance assessment of annotation approaches in generic image collections. It addresses three important parts of annotation evaluation, namely user requirements for the retrieval of annotated visual media, performance measures for multi-label evaluation, and visual test collections. Using the example of multi-label image annotation evaluation, I discuss which concepts to employ for indexing, how to obtain a reliable ground truth to moderate costs, and which evaluation measures are appropriate. This is accompanied by a thorough analysis of related work on system-based performance assessment in Visual Information Retrieval (VIR). Traditional performance measures are classified into four dimensions and investigated according to their appropriateness for visual annotation evaluation. One of the main ideas in this thesis adheres to the common assumption on the binary nature of the score prediction dimension in annotation evaluation. However, the predicted concepts and the set of true indexed concepts interrelate with each other. This work will show how to utilise these semantic relationships for a fine-grained evaluation scenario. Outcomes of this thesis result in a user model for concept-based image retrieval, a fully assessed image annotation test collection, and a number of novel performance measures for image annotation evaluation

Digitale Bibliothek Thüringen