378 research outputs found
Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models
Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed
skepticism that neural ranking models were actually improving ad hoc retrieval
effectiveness in limited data scenarios. He provided anecdotal evidence that
authors of neural IR papers demonstrate "wins" by comparing against weak
baselines. This paper provides a rigorous evaluation of those claims in two
ways: First, we conducted a meta-analysis of papers that have reported
experimental results on the TREC Robust04 test collection. We do not find
evidence of an upward trend in effectiveness over time. In fact, the best
reported results are from a decade ago and no recent neural approach comes
close. Second, we applied five recent neural models to rerank the strong
baselines that Lin used to make his arguments. A significant improvement was
observed for one of the models, demonstrating additivity in gains. While there
appears to be merit to neural IR approaches, at least some of the gains
reported in the literature appear illusory.Comment: Published in the Proceedings of the 42nd Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval (SIGIR
2019
Training Curricula for Open Domain Answer Re-Ranking
In precision-oriented tasks like answer ranking, it is more important to rank
many relevant answers highly than to retrieve all relevant answers. It follows
that a good ranking strategy would be to learn how to identify the easiest
correct answers first (i.e., assign a high ranking score to answers that have
characteristics that usually indicate relevance, and a low ranking score to
those with characteristics that do not), before incorporating more complex
logic to handle difficult cases (e.g., semantic matching or reasoning). In this
work, we apply this idea to the training of neural answer rankers using
curriculum learning. We propose several heuristics to estimate the difficulty
of a given training sample. We show that the proposed heuristics can be used to
build a training curriculum that down-weights difficult samples early in the
training process. As the training process progresses, our approach gradually
shifts to weighting all samples equally, regardless of difficulty. We present a
comprehensive evaluation of our proposed idea on three answer ranking datasets.
Results show that our approach leads to superior performance of two leading
neural ranking architectures, namely BERT and ConvKNRM, using both pointwise
and pairwise losses. When applied to a BERT-based ranker, our method yields up
to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model
trained without a curriculum). This results in models that can achieve
comparable performance to more expensive state-of-the-art techniques.Comment: Accepted at SIGIR 2020 (long
Separating the dynamical effects of climate change and ozone depletion. Part I: Southern Hemisphere stratosphere
A version of the Canadian Middle Atmosphere Model that is coupled to an ocean is used to investigate the separate effects of climate change and ozone depletion on the dynamics of the Southern Hemisphere (SH) stratosphere. This is achieved by performing three sets of simulations extending from 1960 to 2099:
1) greenhouse gases (GHGs) fixed at 1960 levels and ozone depleting substances (ODSs) varying in time,
2) ODSs fixed at 1960 levels and GHGs varying in time, and 3) both GHGs and ODSs varying in time. The response of various dynamical quantities to theGHGand ODS forcings is shown to be additive; that is, trends computed from the sum of the first two simulations are equal to trends from the third. Additivity is shown to hold for the zonal mean zonal wind and temperature, the mass flux into and out of the stratosphere, and the latitudinally averaged wave drag in SH spring and summer, as well as for final warming dates. Ozone depletion and recovery causes seasonal changes in lower-stratosphere mass flux, with reduced polar downwelling in the past followed by increased downwelling in the future in SH spring, and the reverse in SH summer. These seasonal changes are attributed to changes in wave drag caused by ozone-induced changes in the zonal mean zonal winds. Climate change, on the other hand, causes a steady decrease in wave drag during SH spring, which delays the breakdown of the vortex, resulting in increased wave drag in summe
Replication of collaborative filtering generative adversarial networks on recommender systems
CFGAN and its family of models (TagRec, MTPR, and CRGAN) learn to generate personalized and fake-but-realistic preferences for top-N recommendations by solely using previous interactions. The work discusses the impact of certain differences between the CFGAN framework and the model used in the original evaluation. The absence of random noise and the use of real user profiles as condition vectors leaves the generator prone to learn a degenerate solution in which the output vector is identical to the input vector, therefore, behaving essentially as a simple auto-encoder. This work further expands the experimental analysis comparing CFGAN against a selection of simple and well-known properly optimized baselines, observing that CFGAN is not consistently competitive against them despite its high computational cost
Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems
In recent years, algorithm research in the area of recommender systems has
shifted from matrix factorization techniques and their latent factor models to
neural approaches. However, given the proven power of latent factor models,
some newer neural approaches incorporate them within more complex network
architectures. One specific idea, recently put forward by several researchers,
is to consider potential correlations between the latent factors, i.e.,
embeddings, by applying convolutions over the user-item interaction map.
However, contrary to what is claimed in these articles, such interaction maps
do not share the properties of images where Convolutional Neural Networks
(CNNs) are particularly useful. In this work, we show through analytical
considerations and empirical evaluations that the claimed gains reported in the
literature cannot be attributed to the ability of CNNs to model embedding
correlations, as argued in the original papers. Moreover, additional
performance evaluations show that all of the examined recent CNN-based models
are outperformed by existing non-neural machine learning techniques or
traditional nearest-neighbor approaches. On a more general level, our work
points to major methodological issues in recommender systems research.Comment: Source code available here:
https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluatio
Replication of recommender systems with impressions
Impressions are a novel data type in Recommender Systems containing the previously-exposed items, i.e., what was shown on-screen. Due to their novelty, the current literature lacks a characterization of impressions, and replications of previous experiments. Also, previous research works have mainly used impressions in industrial contexts or recommender systems competitions, such as the ACM RecSys Challenges. This work is part of an ongoing study about impressions in recommender systems. It presents an evaluation of impressions recommenders on current open datasets, comparing not only the recommendation quality of impressions recommenders against strong baselines, but also determining if previous progress claims can be replicated
Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity
Recent work has revealed many intriguing empirical phenomena in neural
network training, despite the poorly understood and highly complex loss
landscapes and training dynamics. One of these phenomena, Linear Mode
Connectivity (LMC), has gained considerable attention due to the intriguing
observation that different solutions can be connected by a linear path in the
parameter space while maintaining near-constant training and test losses. In
this work, we introduce a stronger notion of linear connectivity, Layerwise
Linear Feature Connectivity (LLFC), which says that the feature maps of every
layer in different trained networks are also linearly connected. We provide
comprehensive empirical evidence for LLFC across a wide range of settings,
demonstrating that whenever two trained networks satisfy LMC (via either
spawning or permutation methods), they also satisfy LLFC in nearly all the
layers. Furthermore, we delve deeper into the underlying factors contributing
to LLFC, which reveal new insights into the spawning and permutation
approaches. The study of LLFC transcends and advances our understanding of LMC
by adopting a feature-learning perspective.Comment: 25 pages, 23 figure
- …