12,532 research outputs found
Fairness of Exposure in Rankings
Rankings are ubiquitous in the online world today. As we have transitioned
from finding books in libraries to ranking products, jobs, job applicants,
opinions and potential romantic partners, there is a substantial precedent that
ranking systems have a responsibility not only to their users but also to the
items being ranked. To address these often conflicting responsibilities, we
propose a conceptual and computational framework that allows the formulation of
fairness constraints on rankings in terms of exposure allocation. As part of
this framework, we develop efficient algorithms for finding rankings that
maximize the utility for the user while provably satisfying a specifiable
notion of fairness. Since fairness goals can be application specific, we show
how a broad range of fairness constraints can be implemented using our
framework, including forms of demographic parity, disparate treatment, and
disparate impact constraints. We illustrate the effect of these constraints by
providing empirical results on two ranking problems.Comment: In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, London, UK, 201
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Probabilistic Models over Ordered Partitions with Application in Learning to Rank
This paper addresses the general problem of modelling and learning rank data
with ties. We propose a probabilistic generative model, that models the process
as permutations over partitions. This results in super-exponential
combinatorial state space with unknown numbers of partitions and unknown
ordering among them. We approach the problem from the discrete choice theory,
where subsets are chosen in a stagewise manner, reducing the state space per
each stage significantly. Further, we show that with suitable parameterisation,
we can still learn the models in linear time. We evaluate the proposed models
on the problem of learning to rank with the data from the recently held Yahoo!
challenge, and demonstrate that the models are competitive against well-known
rivals.Comment: 19 pages, 2 figure
- …