16 research outputs found

    Querylog-based assessment of retrievability bias in a large newspaper corpus

    Get PDF
    Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simula

    Retrievability in an Integrated Retrieval System: An Extended Study

    Full text link
    Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this paper, we investigate the retrievability in an integrated search system consisting of items from various categories, particularly focussing on datasets, publications \ijdl{and variables} in a real-life Digital Library (DL). The traditional metrics, that is, the Lorenz curve and Gini coefficient, are employed to visualize the diversity in retrievability scores of the \ijdl{three} retrievable document types (specifically datasets, publications, and variables). Our results show a significant popularity bias with certain items being retrieved more often than others. Particularly, it has been shown that certain datasets are more likely to be retrieved than other datasets in the same category. In contrast, the retrievability scores of items from the variable or publication category are more evenly distributed. We have observed that the distribution of document retrievability is more diverse for datasets as compared to publications and variables.Comment: To appear in International Journal on Digital Libraries (IJDL). arXiv admin note: substantial text overlap with arXiv:2205.0093

    Competent Men and Warm Women: Gender Stereotypes and Backlash in Image Search Results

    Get PDF
    There is much concern about algorithms that underlie information services and the view of the world they present. We develop a novel method for examining the content and strength of gender stereotypes in image search, inspired by the trait adjective checklist method. We compare the gender distribution in photos retrieved by Bing for the query “person” and for queries based on 68 character traits (e.g., “intelligent person”) in four regional markets. Photos of men are more often retrieved for “person,” as compared to women. As predicted, photos of women are more often retrieved for warm traits (e.g., “emotional”) whereas agentic traits (e.g., “rational”) are represented by photos of men. A backlash effect, where stereotype-incongruent individuals are penalized, is observed. However, backlash is more prevalent for “competent women” than “warm men.” Results underline the need to understand how and why biases enter search algorithms and at which stages of the engineering proces

    Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

    Full text link
    Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54\% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.Comment: 20 pages; 7 figures; to be published in the proceedings of the Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop from ECCV 202

    Analyzing the influence of bigrams on retrieval bias and effectiveness

    Get PDF
    Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relationship between retrieval effectiveness and retrieval bias. While various factors influencing bias have been examined, there has been no work examining the impact of using bigram within the index on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the bias of a system changes depending on how the documents are represented using unigrams, bigrams or both. Our analysis of three different retrieval models on three TREC collections, shows that using a bigram only representation results in the lowest bias compared to unigram only representation, but at the expense of retrieval effectiveness. However, when both representations are combined it results in reducing the overall bias, as well as increasing effectiveness. These findings suggest that when configuring and indexing the collection, that the bag-of-words approach (unigrams), should be augmented with bigrams to create better and fairer retrieval systems

    Investigating User Perception of Gender Bias in Image Search

    Get PDF
    There is growing evidence that search engines produce results that are socially biased, reinforcing a view of the world that aligns with prevalent social stereotypes. One means to promote greater transparency of search algorithms - which are typically complex and proprietary - is to raise user awareness of biased result sets. However, to date, little is known concerning how users perceive bias in search results, and the degree to which their perceptions differ and/or might be predicted based on user attributes. One particular area of search that has recently gained attention, and forms the focus of this study, is image retrieval and gender bias. We conduct a controlled experiment via crowdsourcing using participants recruited from three countries to measure the extent to which workers perceive a given image results set to be subjective or objective. Demographic information about the workers, along with measures of sexism, are gathered and analysed to investigate whether (gender) biases in the image search results can be detected. Amongst other findings, the results confirm that sexist people are less likely to detect and report gender biases in image search results

    Fairness and transparency throughout a digital humanities workflow: Challenges and recommendations

    Get PDF
    How can we achieve sufficient levels of transparency and fairness for (humanities) research based on historical newspapers? Which concrete measures should be taken by data providers such as libraries, research projects and individual researchers? We approach these questions from the vantage point that digitised newspapers are complex sources with a high degree of heterogeneity caused by a long chain of processing steps, ranging, e.g., from digitisation policies, copyright restrictions to the evolving performance of tools for their enrichment such as OCR or article segmentation. Overall, we emphasise the need for careful documentation of data processing, research practices and the acknowledgement of support from institutions and collaborators

    The relationship between retrievability bias and retrieval performance

    Get PDF
    A long standing problem in the domain of Information Retrieval (IR) has been the influence of biases within an IR system on the ranked results presented to a user. Retrievability is an IR evaluation measure which provides a means to assess the level of bias present in a system by evaluating how \emph{easily} documents in the collection can be found by the IR system in place. Retrievability is intrinsically related to retrieval performance because a document needs to be retrieved before it can be judged relevant. It is therefore reasonable to expect that lowering the level of bias present within a system could lead to improvements in retrieval performance. In this thesis, we undertake an investigation of the nature of the relationship between classical retrieval performance and retrievability bias. We explore the interplay between the two as we alter different aspects of the IR system in an attempt to investigate the \emph{Fairness Hypothesis}: that a system which is fairer (i.e. exerts the least amount of retrievability bias), performs better. To investigate the relationship between retrievability bias and retrieval performance we utilise a set of 6 standard TREC collections (3 news and 3 web) and a suite of standard retrieval models. We investigate this relationship by looking at four main aspects of the retrieval process using this set of TREC collections to also explore how generalisable the findings are. We begin by investigating how the retrieval model used relates to both bias and performance by issuing a large set of queries to a set of common retrieval models. We find a general trend where using a retrieval model that is evaluated to be more \emph{fair} (i.e. less biased) leads to improved performance over less fair systems. Hinting that providing documents with a more equal opportunity for access can lead to better retrieval performance. Following on from our first study, we investigate how bias and performance are affected by tuning length normalisation of several parameterised retrieval models. We explore the space of the length normalisation parameters of BM25, PL2 and Language Modelling. We find that tuning these parameters often leads to a trade off between performance and bias such that minimising bias will often not equate to maximising performance when traditional TREC performance measures are used. However, we find that measures which account for document length and users stopping strategies tend to evaluate the least biased settings to also be the maximum (or near maximum) performing parameter, indicating that the Fairness Hypothesis holds. Following this, we investigate the impact that query length has on retrievability bias. We issue various automatically generated query sets to the system to see if longer or shorter queries tend to influence the level of bias associated with the system. We find that longer queries tend to reduce bias, possibly due to the fact that longer queries will often lead to more documents being retrieved, but the reductions in bias are in diminishing returns. Our studies show that after issuing two terms, each additional term reduces bias by significantly less. Finally, we build on our work by employing some fielded retrieval models. We look at typical fielding, where the field relevance scores are computed individually then combined, and compare it with an enhanced version of fielding, where fields are weighted and combined then scored. We see that there are inherent biases against particular documents in the former model, especially in cases where a field is empty and as such see the latter tends to both perform better and also lower bias when compared with the former. In this thesis, we have examined several different ways in which performance and bias can be related. We conclude that while the Fairness Hypothesis has its merits, it is not a universally applicable idea. We further add to this by noting that the method used to compute bias does not distinguish between positive and negative biases and this influences our results. We do however support the idea that reducing the bias of a system by eliminating biases that are known to be negative should result in improvements in system performance

    Expanding the Usage of Web Archives by Recommending Archived Webpages Using Only the URI

    Get PDF
    Web archives are a window to view past versions of webpages. When a user requests a webpage on the live Web, such as http://tripadvisor.com/where_to_t ravel/, the webpage may not be found, which results in an HyperText Transfer Protocol (HTTP) 404 response. The user then may search for the webpage in a Web archive, such as the Internet Archive. Unfortunately, if this page had never been archived, the user will not be able to view the page, nor will the user gain any information on other webpages that have similar content in the archive, such as the archived webpage http://classy-travel.net. Similarly, if the user requests the webpage http://hokiesports.com/football/ from the Internet Archive, the user will only find the requested webpage, and the user will not gain any information on other webpages that have similar content in the archive, such as the archived webpage http://techsideline.com. In this research, we will build a model for selecting and ranking possible recommended webpages at a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing webpages in the archive that the user may not know existed. First, we detect semantics in the requested Uniform Resource Identifier (URI). Next, we classify the URI using an ontology, such as DMOZ or any website directory. Finally, we filter and rank candidates based on several features, such as archival quality, webpage popularity, temporal similarity, and content similarity. We measure the performance of each step using different techniques, including calculating the F1 to measure of different tokenization methods and the classification. We tested the model using human evaluation to determine if we could classify and find recommendations for a sample of requests from the Internet Archive’s Wayback Machine access log. Overall, when selecting the full categorization, reviewers agreed with 80.3% of the recommendations, which is much higher than “do not agree” and “I do not know”. This indicates the reviewer is more likely to agree on the recommendations when selecting the full categorization. But when selecting the first level only, reviewers only agreed with 25.5% of the recommendations. This indicates that having deep level categorization improves the performance of finding relevant recommendations
    corecore