16 research outputs found

    Overview of NewsREEL’16: Multi-dimensional evaluation of real-time stream-recommendation algorithms

    Get PDF
    Successful news recommendation requires facing the challenges of dynamic item sets, contextual item relevance, and of fulfilling non-functional requirements, such as response time. The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to tackle news recommendation and to optimize and evaluate their recommender algorithms both online and offline. In this paper, we summarize the objectives and challenges of NewsREEL 2016. We cover two contrasting perspectives on the challenge: that of the operator (the business providing recommendations) and that of the challenge participant (the researchers developing recommender algorithms). In the intersection of these perspectives, new insights can be gained on how to effectively evaluate real-time stream recommendation algorithms

    Nonparametric Uncertainty Quantification for Single Deterministic Neural Network

    Full text link
    This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncertainties. The resulting method works directly in the feature space. However, one can apply it to any neural network by considering an embedding of the data induced by the network. We demonstrate the strong performance of the method in uncertainty estimation tasks on text classification problems and a variety of real-world image datasets, such as MNIST, SVHN, CIFAR-100 and several versions of ImageNet.Comment: NeurIPS 2022 pape

    Continuous evaluation of large-scale information access systems : a case for living labs

    Get PDF
    A/B testing is currently being increasingly adopted for the evaluation of commercial information access systems with a large user base since it provides the advantage of observing the efficiency and effectiveness of information access systems under real conditions. Unfortunately, unless university-based researchers closely collaborate with industry or develop their own infrastructure or user base, they cannot validate their ideas in live settings with real users. Without online testing opportunities open to the research communities, academic researchers are unable to employ online evaluation on a larger scale. This means that they do not get feedback for their ideas and cannot advance their research further. Businesses, on the other hand, miss the opportunity to have higher customer satisfaction due to improved systems. In addition, users miss the chance to benefit from an improved information access system. In this chapter, we introduce two evaluation initiatives at CLEF, NewsREEL and Living Labs for IR (LL4IR), that aim to address this growing “evaluation gap” between academia and industry. We explain the challenges and discuss the experiences organizing these living labs

    Reinforced KGs reasoning for explainable sequential recommendation

    Full text link
    We explore the semantic-rich structured information derived from the knowledge graphs (KGs) associated with the user-item interactions and aim to reason out the motivations behind each successful purchase behavior. Existing works on KGs-based explainable recommendations focus purely on path reasoning based on current user-item interactions, which generally result in the incapability of conjecturing users’ subsequence preferences. Considering this, we attempt to model the KGs-based explainable recommendation in sequential settings. Specifically, we propose a novel architecture called Reinforced Sequential Learning with Gated Recurrent Unit (RSL-GRU), which is composed of a Reinforced Path Reasoning Network (RPRN) component and a GRU component. RSL-GRU takes users’ sequential behaviors and their associated KGs in chronological order as input and outputs potential top-N items for each user with appropriate reasoning paths from a global perspective. Our RPRN features a remarkable path reasoning capacity, which is regulated by a user-conditioned derivatively action pruning strategy, a soft reward strategy based on an improved multi-hop scoring function, and a policy-guided sequential path reasoning algorithm. Experimental results on four of Amazon’s large-scale datasets show that our method achieves excellent results compared with several state-of-the-art alternatives
    corecore