168 research outputs found
Meta-Learning and the Full Model Selection Problem
When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning algorithm and evaluation techniques, for a given data project. This indeed was an enjoyable job at the beginning, because to me finding patterns and valuable information from data is always fun. Things become tricky when several projects needed to be done in a relatively short time.
Naturally, as a computer science graduate, I started to ask myself, "What can be automated here?"; because, intuitively, part of my work is more or less a loop that can be programmed. Literally, the loop is "choose, run, test and choose again... until some criterion/goals are met".
In other words, I use my experience or knowledge about machine learning and data mining to guide and speed up the process of selecting and applying techniques in order to build a relatively good predictive model for a given dataset for some purpose. So the following questions arise:
"Is it possible to design and implement a system that helps a data analyst to choose from a set of data mining tools? Or at least that provides a useful recommendation about tools that potentially save some time for a human analyst."
To answer these questions, I decided to undertake a long-term study on this topic, to think, define, research, and simulate this problem before coding my dream system. This thesis presents research results, including new methods, algorithms, and theoretical and empirical analysis from two directions, both of which try to propose systematic and efficient solutions to the questions above, using different resource requirements, namely, the meta-learning-based algorithm/parameter ranking approach and the meta-heuristic search-based full-model selection approach.
Some of the results have been published in research papers; thus, this thesis also serves as a coherent collection of results in a single volume
Learning List-Level Domain-Invariant Representations for Ranking
Domain adaptation aims to transfer the knowledge learned on (data-rich)
source domains to (low-resource) target domains, and a popular method is
invariant representation learning, which matches and aligns the data
distributions on the feature space. Although this method is studied extensively
and applied on classification and regression problems, its adoption on ranking
problems is sporadic, and the few existing implementations lack theoretical
justifications. This paper revisits invariant representation learning for
ranking. Upon reviewing prior work, we found that they implement what we call
item-level alignment, which aligns the distributions of the items being ranked
from all lists in aggregate but ignores their list structure. However, the list
structure should be leveraged, because it is intrinsic to ranking problems
where the data and the metrics are defined and computed on lists, not the items
by themselves. To close this discrepancy, we propose list-level alignment --
learning domain-invariant representations at the higher level of lists. The
benefits are twofold: it leads to the first domain adaptation generalization
bound for ranking, in turn providing theoretical support for the proposed
method, and it achieves better empirical transfer performance for unsupervised
domain adaptation on ranking tasks, including passage reranking.Comment: NeurIPS 2023. Comparison to v1: revised presentation and proof of
Corollary 4.
Understanding and Mitigating Multi-sided Exposure Bias in Recommender Systems
Fairness is a critical system-level objective in recommender systems that has
been the subject of extensive recent research. It is especially important in
multi-sided recommendation platforms where it may be crucial to optimize
utilities not just for the end user, but also for other actors such as item
sellers or producers who desire a fair representation of their items. Existing
solutions do not properly address various aspects of multi-sided fairness in
recommendations as they may either solely have one-sided view (i.e. improving
the fairness only for one side), or do not appropriately measure the fairness
for each actor involved in the system. In this thesis, I aim at first
investigating the impact of unfair recommendations on the system and how these
unfair recommendations can negatively affect major actors in the system. Then,
I seek to propose solutions to tackle the unfairness of recommendations. I
propose a rating transformation technique that works as a pre-processing step
before building the recommendation model to alleviate the inherent popularity
bias in the input data and consequently to mitigate the exposure unfairness for
items and suppliers in the recommendation lists. Also, as another solution, I
propose a general graph-based solution that works as a post-processing approach
after recommendation generation for mitigating the multi-sided exposure bias in
the recommendation results. For evaluation, I introduce several metrics for
measuring the exposure fairness for items and suppliers, and show that these
metrics better capture the fairness properties in the recommendation results. I
perform extensive experiments to evaluate the effectiveness of the proposed
solutions. The experiments on different publicly-available datasets and
comparison with various baselines confirm the superiority of the proposed
solutions in improving the exposure fairness for items and suppliers.Comment: Doctoral thesi
- …