938 research outputs found
Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems
Abstract. The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any par-ticular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. How-ever, the comparative risk-sensitive evaluation of a set of diverse IR systems – as attempted by the TREC 2013 Web track – is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. In-stead of using a particular system’s effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic sys-tem effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.
Generalized bias-variance evaluation of TREC participated systems
Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased “baseline”, which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked
Fairness in Image Search: A Study of Occupational Stereotyping in Image Retrieval and its Debiasing
Multi-modal search engines have experienced significant growth and widespread
use in recent years, making them the second most common internet use. While
search engine systems offer a range of services, the image search field has
recently become a focal point in the information retrieval community, as the
adage goes, "a picture is worth a thousand words". Although popular search
engines like Google excel at image search accuracy and agility, there is an
ongoing debate over whether their search results can be biased in terms of
gender, language, demographics, socio-cultural aspects, and stereotypes. This
potential for bias can have a significant impact on individuals' perceptions
and influence their perspectives.
In this paper, we present our study on bias and fairness in web search, with
a focus on keyword-based image search. We first discuss several kinds of biases
that exist in search systems and why it is important to mitigate them. We
narrow down our study to assessing and mitigating occupational stereotypes in
image search, which is a prevalent fairness issue in image retrieval. For the
assessment of stereotypes, we take gender as an indicator. We explore various
open-source and proprietary APIs for gender identification from images. With
these, we examine the extent of gender bias in top-tanked image search results
obtained for several occupational keywords. To mitigate the bias, we then
propose a fairness-aware re-ranking algorithm that optimizes (a) relevance of
the search result with the keyword and (b) fairness w.r.t genders identified.
We experiment on 100 top-ranked images obtained for 10 occupational keywords
and consider random re-ranking and re-ranking based on relevance as baselines.
Our experimental results show that the fairness-aware re-ranking algorithm
produces rankings with better fairness scores and competitive relevance scores
than the baselines.Comment: 20 Pages, Work uses Proprietary Search Systems from the year 202
Semi-supervised learning and fairness-aware learning under class imbalance
With the advent of Web 2.0 and the rapid technological advances, there is a plethora of data in every field; however, more data does not necessarily imply more information, rather the quality of data (veracity aspect) plays a key role. Data quality is a major issue, since machine learning algorithms are solely based on historical data to derive novel hypotheses. Data may contain noise, outliers, missing values and/or class labels, and skewed data distributions. The latter case, the so-called class-imbalance problem, is quite old and still affects dramatically machine learning algorithms. Class-imbalance causes classification models to learn effectively one particular class (majority) while ignoring other classes (minority). In extend to this issue, machine learning models that are applied in domains of high societal impact have become biased towards groups of people or individuals who are not well represented within the data. Direct and indirect discriminatory behavior is prohibited by international laws; thus, there is an urgency of mitigating discriminatory outcomes from machine learning algorithms.
In this thesis, we address the aforementioned issues and propose methods that tackle class imbalance, and mitigate discriminatory outcomes in machine learning algorithms. As part of this thesis, we make the following contributions:
• Tackling class-imbalance in semi-supervised learning – The class-imbalance problem is very often encountered in classification. There is a variety of methods that tackle this problem; however, there is a lack of methods that deal with class-imbalance in the semi-supervised learning. We address this problem by employing data augmentation in semi-supervised learning process in order to equalize class distributions. We show that semi-supervised learning coupled with data augmentation methods can overcome class-imbalance propagation and significantly outperform the standard semi-supervised annotation process.
• Mitigating unfairness in supervised models – Fairness in supervised learning has received a lot of attention over the last years. A growing body of pre-, in- and postprocessing approaches has been proposed to mitigate algorithmic bias; however, these methods consider error rate as the performance measure of the machine learning algorithm, which causes high error rates on the under-represented class. To deal with this problem, we propose approaches that operate in pre-, in- and post-processing layers while accounting for all classes. Our proposed methods outperform state-of-the-art methods in terms of performance while being able to mitigate unfair outcomes
FedHAP: Federated Hashing with Global Prototypes for Cross-silo Retrieval
Deep hashing has been widely applied in large-scale data retrieval due to its
superior retrieval efficiency and low storage cost. However, data are often
scattered in data silos with privacy concerns, so performing centralized data
storage and retrieval is not always possible. Leveraging the concept of
federated learning (FL) to perform deep hashing is a recent research trend.
However, existing frameworks mostly rely on the aggregation of the local deep
hashing models, which are trained by performing similarity learning with local
skewed data only. Therefore, they cannot work well for non-IID clients in a
real federated environment. To overcome these challenges, we propose a novel
federated hashing framework that enables participating clients to jointly train
the shared deep hashing model by leveraging the prototypical hash codes for
each class. Globally, the transmission of global prototypes with only one
prototypical hash code per class will minimize the impact of communication cost
and privacy risk. Locally, the use of global prototypes are maximized by
jointly training a discriminator network and the local hashing network.
Extensive experiments on benchmark datasets are conducted to demonstrate that
our method can significantly improve the performance of the deep hashing model
in the federated environments with non-IID data distributions
Recuperação multimodal e interativa de informação orientada por diversidade
Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os métodos de Recuperação da Informação, especialmente considerando-se dados multimídia, evoluíram para a integração de múltiplas fontes de evidência na análise de relevância de itens em uma tarefa de busca. Neste contexto, para atenuar a distância semântica entre as propriedades de baixo nível extraídas do conteúdo dos objetos digitais e os conceitos semânticos de alto nível (objetos, categorias, etc.) e tornar estes sistemas adaptativos às diferentes necessidades dos usuários, modelos interativos que consideram o usuário mais próximo do processo de recuperação têm sido propostos, permitindo a sua interação com o sistema, principalmente por meio da realimentação de relevância implícita ou explícita. Analogamente, a promoção de diversidade surgiu como uma alternativa para lidar com consultas ambíguas ou incompletas. Adicionalmente, muitos trabalhos têm tratado a ideia de minimização do esforço requerido do usuário em fornecer julgamentos de relevância, à medida que mantém níveis aceitáveis de eficácia. Esta tese aborda, propõe e analisa experimentalmente métodos de recuperação da informação interativos e multimodais orientados por diversidade. Este trabalho aborda de forma abrangente a literatura acerca da recuperação interativa da informação e discute sobre os avanços recentes, os grandes desafios de pesquisa e oportunidades promissoras de trabalho. Nós propusemos e avaliamos dois métodos de aprimoramento do balanço entre relevância e diversidade, os quais integram múltiplas informações de imagens, tais como: propriedades visuais, metadados textuais, informação geográfica e descritores de credibilidade dos usuários. Por sua vez, como integração de técnicas de recuperação interativa e de promoção de diversidade, visando maximizar a cobertura de múltiplas interpretações/aspectos de busca e acelerar a transferência de informação entre o usuário e o sistema, nós propusemos e avaliamos um método multimodal de aprendizado para ranqueamento utilizando realimentação de relevância sobre resultados diversificados. Nossa análise experimental mostra que o uso conjunto de múltiplas fontes de informação teve impacto positivo nos algoritmos de balanceamento entre relevância e diversidade. Estes resultados sugerem que a integração de filtragem e re-ranqueamento multimodais é eficaz para o aumento da relevância dos resultados e também como mecanismo de potencialização dos métodos de diversificação. Além disso, com uma análise experimental minuciosa, nós investigamos várias questões de pesquisa relacionadas à possibilidade de aumento da diversidade dos resultados e a manutenção ou até mesmo melhoria da sua relevância em sessões interativas. Adicionalmente, nós analisamos como o esforço em diversificar afeta os resultados gerais de uma sessão de busca e como diferentes abordagens de diversificação se comportam para diferentes modalidades de dados. Analisando a eficácia geral e também em cada iteração de realimentação de relevância, nós mostramos que introduzir diversidade nos resultados pode prejudicar resultados iniciais, enquanto que aumenta significativamente a eficácia geral em uma sessão de busca, considerando-se não apenas a relevância e diversidade geral, mas também o quão cedo o usuário é exposto ao mesmo montante de itens relevantes e nível de diversidadeAbstract: Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking was effective on improving result relevance and also boosted diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversityDoutoradoCiência da ComputaçãoDoutor em Ciência da ComputaçãoP-4388/2010140977/2012-0CAPESCNP
Understanding and Mitigating Multi-sided Exposure Bias in Recommender Systems
Fairness is a critical system-level objective in recommender systems that has
been the subject of extensive recent research. It is especially important in
multi-sided recommendation platforms where it may be crucial to optimize
utilities not just for the end user, but also for other actors such as item
sellers or producers who desire a fair representation of their items. Existing
solutions do not properly address various aspects of multi-sided fairness in
recommendations as they may either solely have one-sided view (i.e. improving
the fairness only for one side), or do not appropriately measure the fairness
for each actor involved in the system. In this thesis, I aim at first
investigating the impact of unfair recommendations on the system and how these
unfair recommendations can negatively affect major actors in the system. Then,
I seek to propose solutions to tackle the unfairness of recommendations. I
propose a rating transformation technique that works as a pre-processing step
before building the recommendation model to alleviate the inherent popularity
bias in the input data and consequently to mitigate the exposure unfairness for
items and suppliers in the recommendation lists. Also, as another solution, I
propose a general graph-based solution that works as a post-processing approach
after recommendation generation for mitigating the multi-sided exposure bias in
the recommendation results. For evaluation, I introduce several metrics for
measuring the exposure fairness for items and suppliers, and show that these
metrics better capture the fairness properties in the recommendation results. I
perform extensive experiments to evaluate the effectiveness of the proposed
solutions. The experiments on different publicly-available datasets and
comparison with various baselines confirm the superiority of the proposed
solutions in improving the exposure fairness for items and suppliers.Comment: Doctoral thesi
- …