9,978 research outputs found
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of
traditional information retrieval (IR) models by using top-ranked documents to
identify and weight new query terms, thereby reducing the effect of
query-document vocabulary mismatches. While neural retrieval models have
recently demonstrated strong results for ad-hoc retrieval, combining them with
PRF is not straightforward due to incompatibilities between existing PRF
approaches and neural architectures. To bridge this gap, we propose an
end-to-end neural PRF framework that can be used with existing neural IR models
by embedding different neural models as building blocks. Extensive experiments
on two standard test collections confirm the effectiveness of the proposed NPRF
framework in improving the performance of two state-of-the-art neural IR
models.Comment: Full paper in EMNLP 201
A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression
Many machine learning problems can be formulated as predicting labels for a
pair of objects. Problems of that kind are often referred to as pairwise
learning, dyadic prediction or network inference problems. During the last
decade kernel methods have played a dominant role in pairwise learning. They
still obtain a state-of-the-art predictive performance, but a theoretical
analysis of their behavior has been underexplored in the machine learning
literature.
In this work we review and unify existing kernel-based algorithms that are
commonly used in different pairwise learning settings, ranging from matrix
filtering to zero-shot learning. To this end, we focus on closed-form efficient
instantiations of Kronecker kernel ridge regression. We show that independent
task kernel ridge regression, two-step kernel ridge regression and a linear
matrix filter arise naturally as a special case of Kronecker kernel ridge
regression, implying that all these methods implicitly minimize a squared loss.
In addition, we analyze universality, consistency and spectral filtering
properties. Our theoretical results provide valuable insights in assessing the
advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427
Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers
Query expansion has been proved to be effective in improving recall and
precision of first-stage retrievers, and yet its influence on a complicated,
state-of-the-art cross-encoder ranker remains under-explored. We first show
that directly applying the expansion techniques in the current literature to
state-of-the-art neural rankers can result in deteriorated zero-shot
performance. To this end, we propose GFF, a pipeline that includes a large
language model and a neural ranker, to Generate, Filter, and Fuse query
expansions more effectively in order to improve the zero-shot ranking metrics
such as nDCG@10. Specifically, GFF first calls an instruction-following
language model to generate query-related keywords through a reasoning chain.
Leveraging self-consistency and reciprocal rank weighting, GFF further filters
and combines the ranking results of each expanded query dynamically. By
utilizing this pipeline, we show that GFF can improve the zero-shot nDCG@10 on
BEIR and TREC DL 2019/2020. We also analyze different modelling choices in the
GFF pipeline and shed light on the future directions in query expansion for
zero-shot neural rankers
Collaborative Summarization of Topic-Related Videos
Large collections of videos are grouped into clusters by a topic keyword,
such as Eiffel Tower or Surfing, with many important visual concepts repeating
across them. Such a topically close set of videos have mutual influence on each
other, which could be used to summarize one of them by exploiting information
from others in the set. We build on this intuition to develop a novel approach
to extract a summary that simultaneously captures both important
particularities arising in the given video, as well as, generalities identified
from the set of videos. The topic-related videos provide visual context to
identify the important parts of the video being summarized. We achieve this by
developing a collaborative sparse optimization method which can be efficiently
solved by a half-quadratic minimization algorithm. Our work builds upon the
idea of collaborative techniques from information retrieval and natural
language processing, which typically use the attributes of other similar
objects to predict the attribute of a given object. Experiments on two
challenging and diverse datasets well demonstrate the efficacy of our approach
over state-of-the-art methods.Comment: CVPR 201
Multimodal on-the-fly news media exploration
Information is presented to us in many ways and one of the most popular and trustworthy
sources of information are the news media. Every day, news events from around the
world are broadcasted through digital platforms and comprise a wide range of topics,
divided into different categories and written by a diverse number of authors. These are
presented to us online in the form of text but also in the form of images that help us to
visually contextualize and "witness" the event with our own eyes. This way of presenting
news, results in a multimodal news articles format.
Most news sites present us on their landing page with the latest and most popular
news, allowing users to search for specific topics. However, given the large number of
articles, especially on topics such as "COVID-19" or "War in Ukraine", enabling users to get
a complete picture of the events and their origins in a dynamic and effective way becomes
a particularly difficult task. Having a complete picture of the events also helps the users
to be less susceptible to biased interpretations.
This thesis investigates zero-shot deep multimodal approaches for the news domain
that is, given an image or a relevant text of a news article, we are able to analyze and
aggregate related news pieces on-the-fly. Textual and visual processing with deep neural
methods transform the text and images into the embeddings needed to reach the desired
topic through context.
We collected the news’ relevant information which resulted in approximately 4 million
documents, processed the multimodal information to enable embedding-based searches
and then provided aggregations of news according to topics and visualizations selected by
the user using an interface that enabled the exploration of unfolding events. The outcome
was a zero-shot news pipeline that made multA informação é-nos apresentada de muitas maneiras e uma das fontes de informação mais
populares e fiáveis são os meios noticiosos. Todos os dias, eventos noticiosos de todo o
mundo são transmitidos através de plataformas digitais e compreendem uma vasta gama
de tópicos, divididos em diferentes categorias e escritos por um número diversificado
de autores. Estes são-nos apresentados online sob a forma de texto mas também sob a
forma de imagens que nos ajudam a contextualizar visualmente e permitem aos leitores
"testemunhar"o evento com os seus próprios olhos. Esta forma de apresentação de notícias
resulta num formato de artigos de notícias multimodais.
Amaioria dos sites de notícias apresenta-nos na sua página de destino as últimas e mais
populares notícias e permite ao utilizador pesquisar tópicos específicos. Contudo, dado o
grande número de artigos, especialmente sobre tópicos como "COVID-19"ou "Guerra na
Ucrânia", permitir aos utilizadores obter uma imagem completa dos acontecimentos e das
suas origens de uma forma dinâmica e eficaz torna-se uma tarefa particularmente difícil.
Esta tese investiga abordagens multimodais profundas de zero-shot para o domínio
das notícias que, dada uma imagem e um texto relevante de um artigo noticioso, é capaz
de analisar e agregar peças jornalísticas em tempo real. O processamento textual e visual
transforma o texto e imagens nos "embeddings"necessários para chegar ao tópico desejado
através do contexto.
Recolhemos a informação relevante das notícias que resultou em aproximadamente 4
milhões de documentos, processámos a informação multimodal para permitir pesquisas
baseadas em "embeddings"e depois fornecemos agregações de notícias de acordo com os
tópicos e visualizações que foram selecionadas pelo utilizador utilizando uma interface
que permite a exploração de acontecimentos em desenvolvimento. O resultado foi um
fluxo de notícias "zero-shot"que torna as notícias multimodais prontamente disponíveis
para navegar de uma forma semântica e eficiente
CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise
In this paper, we study the problem of learning image classification models
with label noise. Existing approaches depending on human supervision are
generally not scalable as manually identifying correct or incorrect labels is
time-consuming, whereas approaches not relying on human supervision are
scalable but less effective. To reduce the amount of human supervision for
label noise cleaning, we introduce CleanNet, a joint neural embedding network,
which only requires a fraction of the classes being manually verified to
provide the knowledge of label noise that can be transferred to other classes.
We further integrate CleanNet and conventional convolutional neural network
classifier into one framework for image classification learning. We demonstrate
the effectiveness of the proposed algorithm on both of the label noise
detection task and the image classification on noisy data task on several
large-scale datasets. Experimental results show that CleanNet can reduce label
noise detection error rate on held-out classes where no human supervision
available by 41.5% compared to current weakly supervised methods. It also
achieves 47% of the performance gain of verifying all images with only 3.2%
images verified on an image classification task. Source code and dataset will
be available at kuanghuei.github.io/CleanNetProject.Comment: Accepted to CVPR 201
Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
Event detection in unconstrained videos is conceived as a content-based video
retrieval with two modalities: textual and visual. Given a text describing a
novel event, the goal is to rank related videos accordingly. This task is
zero-exemplar, no video examples are given to the novel event.
Related works train a bank of concept detectors on external data sources.
These detectors predict confidence scores for test videos, which are ranked and
retrieved accordingly. In contrast, we learn a joint space in which the visual
and textual representations are embedded. The space casts a novel event as a
probability of pre-defined events. Also, it learns to measure the distance
between an event and its related videos.
Our model is trained end-to-end on publicly available EventNet. When applied
to TRECVID Multimedia Event Detection dataset, it outperforms the
state-of-the-art by a considerable margin.Comment: IEEE CVPR 201
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
We carried out a reproducibility study of InPars, which is a method for
unsupervised training of neural rankers (Bonifacio et al., 2022). As a
by-product, we developed InPars-light, which is a simple-yet-effective
modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller
ranking models and only a freely available language model BLOOM, which -- as we
found out -- produced more accurate rankers compared to a proprietary GPT-3
model. On all five English retrieval collections (used in the original InPars
study) we obtained substantial (7%-30%) and statistically significant
improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer
MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars
study only a 100x larger monoT5-3B model consistently outperformed BM25,
whereas their smaller monoT5-220M model (which is still 7x larger than our
MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same
three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par
with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact,
on three out of five datasets, DeBERTA slightly outperformed monoT5-3B.
Finally, these good results were achieved by re-ranking only 100 candidate
documents compared to 1000 used by Bonifacio et al. (2022). We believe that
InPars-light is the first truly cost-effective prompt-based unsupervised recipe
to train and deploy neural ranking models that outperform BM25. Our code and
data is publicly available. https://github.com/searchivarius/inpars_light
- …