938 research outputs found

    Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems

    Full text link
    Abstract. The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any par-ticular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. How-ever, the comparative risk-sensitive evaluation of a set of diverse IR systems – as attempted by the TREC 2013 Web track – is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. In-stead of using a particular system’s effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic sys-tem effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.

    Generalized bias-variance evaluation of TREC participated systems

    Get PDF
    Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased “baseline”, which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked

    Fairness in Image Search: A Study of Occupational Stereotyping in Image Retrieval and its Debiasing

    Full text link
    Multi-modal search engines have experienced significant growth and widespread use in recent years, making them the second most common internet use. While search engine systems offer a range of services, the image search field has recently become a focal point in the information retrieval community, as the adage goes, "a picture is worth a thousand words". Although popular search engines like Google excel at image search accuracy and agility, there is an ongoing debate over whether their search results can be biased in terms of gender, language, demographics, socio-cultural aspects, and stereotypes. This potential for bias can have a significant impact on individuals' perceptions and influence their perspectives. In this paper, we present our study on bias and fairness in web search, with a focus on keyword-based image search. We first discuss several kinds of biases that exist in search systems and why it is important to mitigate them. We narrow down our study to assessing and mitigating occupational stereotypes in image search, which is a prevalent fairness issue in image retrieval. For the assessment of stereotypes, we take gender as an indicator. We explore various open-source and proprietary APIs for gender identification from images. With these, we examine the extent of gender bias in top-tanked image search results obtained for several occupational keywords. To mitigate the bias, we then propose a fairness-aware re-ranking algorithm that optimizes (a) relevance of the search result with the keyword and (b) fairness w.r.t genders identified. We experiment on 100 top-ranked images obtained for 10 occupational keywords and consider random re-ranking and re-ranking based on relevance as baselines. Our experimental results show that the fairness-aware re-ranking algorithm produces rankings with better fairness scores and competitive relevance scores than the baselines.Comment: 20 Pages, Work uses Proprietary Search Systems from the year 202

    Semi-supervised learning and fairness-aware learning under class imbalance

    Get PDF
    With the advent of Web 2.0 and the rapid technological advances, there is a plethora of data in every field; however, more data does not necessarily imply more information, rather the quality of data (veracity aspect) plays a key role. Data quality is a major issue, since machine learning algorithms are solely based on historical data to derive novel hypotheses. Data may contain noise, outliers, missing values and/or class labels, and skewed data distributions. The latter case, the so-called class-imbalance problem, is quite old and still affects dramatically machine learning algorithms. Class-imbalance causes classification models to learn effectively one particular class (majority) while ignoring other classes (minority). In extend to this issue, machine learning models that are applied in domains of high societal impact have become biased towards groups of people or individuals who are not well represented within the data. Direct and indirect discriminatory behavior is prohibited by international laws; thus, there is an urgency of mitigating discriminatory outcomes from machine learning algorithms. In this thesis, we address the aforementioned issues and propose methods that tackle class imbalance, and mitigate discriminatory outcomes in machine learning algorithms. As part of this thesis, we make the following contributions: • Tackling class-imbalance in semi-supervised learning – The class-imbalance problem is very often encountered in classification. There is a variety of methods that tackle this problem; however, there is a lack of methods that deal with class-imbalance in the semi-supervised learning. We address this problem by employing data augmentation in semi-supervised learning process in order to equalize class distributions. We show that semi-supervised learning coupled with data augmentation methods can overcome class-imbalance propagation and significantly outperform the standard semi-supervised annotation process. • Mitigating unfairness in supervised models – Fairness in supervised learning has received a lot of attention over the last years. A growing body of pre-, in- and postprocessing approaches has been proposed to mitigate algorithmic bias; however, these methods consider error rate as the performance measure of the machine learning algorithm, which causes high error rates on the under-represented class. To deal with this problem, we propose approaches that operate in pre-, in- and post-processing layers while accounting for all classes. Our proposed methods outperform state-of-the-art methods in terms of performance while being able to mitigate unfair outcomes

    FedHAP: Federated Hashing with Global Prototypes for Cross-silo Retrieval

    Full text link
    Deep hashing has been widely applied in large-scale data retrieval due to its superior retrieval efficiency and low storage cost. However, data are often scattered in data silos with privacy concerns, so performing centralized data storage and retrieval is not always possible. Leveraging the concept of federated learning (FL) to perform deep hashing is a recent research trend. However, existing frameworks mostly rely on the aggregation of the local deep hashing models, which are trained by performing similarity learning with local skewed data only. Therefore, they cannot work well for non-IID clients in a real federated environment. To overcome these challenges, we propose a novel federated hashing framework that enables participating clients to jointly train the shared deep hashing model by leveraging the prototypical hash codes for each class. Globally, the transmission of global prototypes with only one prototypical hash code per class will minimize the impact of communication cost and privacy risk. Locally, the use of global prototypes are maximized by jointly training a discriminator network and the local hashing network. Extensive experiments on benchmark datasets are conducted to demonstrate that our method can significantly improve the performance of the deep hashing model in the federated environments with non-IID data distributions

    Recuperação multimodal e interativa de informação orientada por diversidade

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os métodos de Recuperação da Informação, especialmente considerando-se dados multimídia, evoluíram para a integração de múltiplas fontes de evidência na análise de relevância de itens em uma tarefa de busca. Neste contexto, para atenuar a distância semântica entre as propriedades de baixo nível extraídas do conteúdo dos objetos digitais e os conceitos semânticos de alto nível (objetos, categorias, etc.) e tornar estes sistemas adaptativos às diferentes necessidades dos usuários, modelos interativos que consideram o usuário mais próximo do processo de recuperação têm sido propostos, permitindo a sua interação com o sistema, principalmente por meio da realimentação de relevância implícita ou explícita. Analogamente, a promoção de diversidade surgiu como uma alternativa para lidar com consultas ambíguas ou incompletas. Adicionalmente, muitos trabalhos têm tratado a ideia de minimização do esforço requerido do usuário em fornecer julgamentos de relevância, à medida que mantém níveis aceitáveis de eficácia. Esta tese aborda, propõe e analisa experimentalmente métodos de recuperação da informação interativos e multimodais orientados por diversidade. Este trabalho aborda de forma abrangente a literatura acerca da recuperação interativa da informação e discute sobre os avanços recentes, os grandes desafios de pesquisa e oportunidades promissoras de trabalho. Nós propusemos e avaliamos dois métodos de aprimoramento do balanço entre relevância e diversidade, os quais integram múltiplas informações de imagens, tais como: propriedades visuais, metadados textuais, informação geográfica e descritores de credibilidade dos usuários. Por sua vez, como integração de técnicas de recuperação interativa e de promoção de diversidade, visando maximizar a cobertura de múltiplas interpretações/aspectos de busca e acelerar a transferência de informação entre o usuário e o sistema, nós propusemos e avaliamos um método multimodal de aprendizado para ranqueamento utilizando realimentação de relevância sobre resultados diversificados. Nossa análise experimental mostra que o uso conjunto de múltiplas fontes de informação teve impacto positivo nos algoritmos de balanceamento entre relevância e diversidade. Estes resultados sugerem que a integração de filtragem e re-ranqueamento multimodais é eficaz para o aumento da relevância dos resultados e também como mecanismo de potencialização dos métodos de diversificação. Além disso, com uma análise experimental minuciosa, nós investigamos várias questões de pesquisa relacionadas à possibilidade de aumento da diversidade dos resultados e a manutenção ou até mesmo melhoria da sua relevância em sessões interativas. Adicionalmente, nós analisamos como o esforço em diversificar afeta os resultados gerais de uma sessão de busca e como diferentes abordagens de diversificação se comportam para diferentes modalidades de dados. Analisando a eficácia geral e também em cada iteração de realimentação de relevância, nós mostramos que introduzir diversidade nos resultados pode prejudicar resultados iniciais, enquanto que aumenta significativamente a eficácia geral em uma sessão de busca, considerando-se não apenas a relevância e diversidade geral, mas também o quão cedo o usuário é exposto ao mesmo montante de itens relevantes e nível de diversidadeAbstract: Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking was effective on improving result relevance and also boosted diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversityDoutoradoCiência da ComputaçãoDoutor em Ciência da ComputaçãoP-4388/2010140977/2012-0CAPESCNP

    Understanding and Mitigating Multi-sided Exposure Bias in Recommender Systems

    Get PDF
    Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. It is especially important in multi-sided recommendation platforms where it may be crucial to optimize utilities not just for the end user, but also for other actors such as item sellers or producers who desire a fair representation of their items. Existing solutions do not properly address various aspects of multi-sided fairness in recommendations as they may either solely have one-sided view (i.e. improving the fairness only for one side), or do not appropriately measure the fairness for each actor involved in the system. In this thesis, I aim at first investigating the impact of unfair recommendations on the system and how these unfair recommendations can negatively affect major actors in the system. Then, I seek to propose solutions to tackle the unfairness of recommendations. I propose a rating transformation technique that works as a pre-processing step before building the recommendation model to alleviate the inherent popularity bias in the input data and consequently to mitigate the exposure unfairness for items and suppliers in the recommendation lists. Also, as another solution, I propose a general graph-based solution that works as a post-processing approach after recommendation generation for mitigating the multi-sided exposure bias in the recommendation results. For evaluation, I introduce several metrics for measuring the exposure fairness for items and suppliers, and show that these metrics better capture the fairness properties in the recommendation results. I perform extensive experiments to evaluate the effectiveness of the proposed solutions. The experiments on different publicly-available datasets and comparison with various baselines confirm the superiority of the proposed solutions in improving the exposure fairness for items and suppliers.Comment: Doctoral thesi
    corecore