9,978 research outputs found

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

    Full text link
    Pseudo-relevance feedback (PRF) is commonly used to boost the performance of traditional information retrieval (IR) models by using top-ranked documents to identify and weight new query terms, thereby reducing the effect of query-document vocabulary mismatches. While neural retrieval models have recently demonstrated strong results for ad-hoc retrieval, combining them with PRF is not straightforward due to incompatibilities between existing PRF approaches and neural architectures. To bridge this gap, we propose an end-to-end neural PRF framework that can be used with existing neural IR models by embedding different neural models as building blocks. Extensive experiments on two standard test collections confirm the effectiveness of the proposed NPRF framework in improving the performance of two state-of-the-art neural IR models.Comment: Full paper in EMNLP 201

    A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

    Full text link
    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427

    Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers

    Full text link
    Query expansion has been proved to be effective in improving recall and precision of first-stage retrievers, and yet its influence on a complicated, state-of-the-art cross-encoder ranker remains under-explored. We first show that directly applying the expansion techniques in the current literature to state-of-the-art neural rankers can result in deteriorated zero-shot performance. To this end, we propose GFF, a pipeline that includes a large language model and a neural ranker, to Generate, Filter, and Fuse query expansions more effectively in order to improve the zero-shot ranking metrics such as nDCG@10. Specifically, GFF first calls an instruction-following language model to generate query-related keywords through a reasoning chain. Leveraging self-consistency and reciprocal rank weighting, GFF further filters and combines the ranking results of each expanded query dynamically. By utilizing this pipeline, we show that GFF can improve the zero-shot nDCG@10 on BEIR and TREC DL 2019/2020. We also analyze different modelling choices in the GFF pipeline and shed light on the future directions in query expansion for zero-shot neural rankers

    Collaborative Summarization of Topic-Related Videos

    Full text link
    Large collections of videos are grouped into clusters by a topic keyword, such as Eiffel Tower or Surfing, with many important visual concepts repeating across them. Such a topically close set of videos have mutual influence on each other, which could be used to summarize one of them by exploiting information from others in the set. We build on this intuition to develop a novel approach to extract a summary that simultaneously captures both important particularities arising in the given video, as well as, generalities identified from the set of videos. The topic-related videos provide visual context to identify the important parts of the video being summarized. We achieve this by developing a collaborative sparse optimization method which can be efficiently solved by a half-quadratic minimization algorithm. Our work builds upon the idea of collaborative techniques from information retrieval and natural language processing, which typically use the attributes of other similar objects to predict the attribute of a given object. Experiments on two challenging and diverse datasets well demonstrate the efficacy of our approach over state-of-the-art methods.Comment: CVPR 201

    Multimodal on-the-fly news media exploration

    Get PDF
    Information is presented to us in many ways and one of the most popular and trustworthy sources of information are the news media. Every day, news events from around the world are broadcasted through digital platforms and comprise a wide range of topics, divided into different categories and written by a diverse number of authors. These are presented to us online in the form of text but also in the form of images that help us to visually contextualize and "witness" the event with our own eyes. This way of presenting news, results in a multimodal news articles format. Most news sites present us on their landing page with the latest and most popular news, allowing users to search for specific topics. However, given the large number of articles, especially on topics such as "COVID-19" or "War in Ukraine", enabling users to get a complete picture of the events and their origins in a dynamic and effective way becomes a particularly difficult task. Having a complete picture of the events also helps the users to be less susceptible to biased interpretations. This thesis investigates zero-shot deep multimodal approaches for the news domain that is, given an image or a relevant text of a news article, we are able to analyze and aggregate related news pieces on-the-fly. Textual and visual processing with deep neural methods transform the text and images into the embeddings needed to reach the desired topic through context. We collected the news’ relevant information which resulted in approximately 4 million documents, processed the multimodal information to enable embedding-based searches and then provided aggregations of news according to topics and visualizations selected by the user using an interface that enabled the exploration of unfolding events. The outcome was a zero-shot news pipeline that made multA informação é-nos apresentada de muitas maneiras e uma das fontes de informação mais populares e fiáveis são os meios noticiosos. Todos os dias, eventos noticiosos de todo o mundo são transmitidos através de plataformas digitais e compreendem uma vasta gama de tópicos, divididos em diferentes categorias e escritos por um número diversificado de autores. Estes são-nos apresentados online sob a forma de texto mas também sob a forma de imagens que nos ajudam a contextualizar visualmente e permitem aos leitores "testemunhar"o evento com os seus próprios olhos. Esta forma de apresentação de notícias resulta num formato de artigos de notícias multimodais. Amaioria dos sites de notícias apresenta-nos na sua página de destino as últimas e mais populares notícias e permite ao utilizador pesquisar tópicos específicos. Contudo, dado o grande número de artigos, especialmente sobre tópicos como "COVID-19"ou "Guerra na Ucrânia", permitir aos utilizadores obter uma imagem completa dos acontecimentos e das suas origens de uma forma dinâmica e eficaz torna-se uma tarefa particularmente difícil. Esta tese investiga abordagens multimodais profundas de zero-shot para o domínio das notícias que, dada uma imagem e um texto relevante de um artigo noticioso, é capaz de analisar e agregar peças jornalísticas em tempo real. O processamento textual e visual transforma o texto e imagens nos "embeddings"necessários para chegar ao tópico desejado através do contexto. Recolhemos a informação relevante das notícias que resultou em aproximadamente 4 milhões de documentos, processámos a informação multimodal para permitir pesquisas baseadas em "embeddings"e depois fornecemos agregações de notícias de acordo com os tópicos e visualizações que foram selecionadas pelo utilizador utilizando uma interface que permite a exploração de acontecimentos em desenvolvimento. O resultado foi um fluxo de notícias "zero-shot"que torna as notícias multimodais prontamente disponíveis para navegar de uma forma semântica e eficiente

    CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

    Full text link
    In this paper, we study the problem of learning image classification models with label noise. Existing approaches depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is time-consuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce the amount of human supervision for label noise cleaning, we introduce CleanNet, a joint neural embedding network, which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that can be transferred to other classes. We further integrate CleanNet and conventional convolutional neural network classifier into one framework for image classification learning. We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets. Experimental results show that CleanNet can reduce label noise detection error rate on held-out classes where no human supervision available by 41.5% compared to current weakly supervised methods. It also achieves 47% of the performance gain of verifying all images with only 3.2% images verified on an image classification task. Source code and dataset will be available at kuanghuei.github.io/CleanNetProject.Comment: Accepted to CVPR 201

    Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

    Get PDF
    Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.Comment: IEEE CVPR 201

    InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

    Full text link
    We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. https://github.com/searchivarius/inpars_light
    corecore