Search CORE

14,049 research outputs found

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

Author: Alistair Moffat
Collins-Thompson Kevyn
Sakai Tetsuya
Voorhees Ellen M.
Yang Hui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2018
Field of study

Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.Comment: Original version: 10 pages. Preprint of full paper to appear at SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, US

arXiv.org e-Print Archive

Crossref

Answer Summarization for Technical Queries: Benchmark and New Approach

Author: Chengran Yang
Han DongGyun
He Junda
Lo David
Shi Jieke
Shi Yucen
Thung Ferdian
Xu Bowen
Yang Zhou
Zhang Ting
Zhou Xin
Publication venue
Publication date: 22/09/2022
Field of study

Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites. To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for technical queries for SQA sites. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvement. Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module, 2) Centrality Estimation module, and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%-14.90%, 32.75%-36.59%, and 12.61%-17.54%, in terms of ROUGE-1, ROUGE-2, and ROUGE-L on automatic evaluation, and 5.79%-9.23% and 17.03%-17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that the automatic evaluation of our benchmark can uncover findings similar to the ones found through user studies. More importantly, automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot.Comment: Accepted by ASE 202

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Royal Holloway - Pure

Answer Summarization for Technical Queries:Benchmark and New Approach

Author: Han Donggyun
He Junda
Lo David
Shi Jieke
Shi Yucen
Thung Ferdian
Xu Bowen
Yang Chengran
Yang Zhou
Zhang Ting
Zhou Xin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/01/2023
Field of study

Royal Holloway - Pure

Choosing effective methods for design diversity - How to progress from intuition to science

Author: Popov P. T.
Romanovsky A.
Strigini L.
Publication venue
Publication date: 01/01/1999
Field of study

Design diversity is a popular defence against design faults in safety critical systems. Design diversity is at times pursued by simply isolating the development teams of the different versions, but it is presumably better to "force" diversity, by appropriate prescriptions to the teams. There are many ways of forcing diversity. Yet, managers who have to choose a cost-effective combination of these have little guidance except their own intuition. We argue the need for more scientifically based recommendations, and outline the problems with producing them. We focus on what we think is the standard basis for most recommendations: the belief that, in order to produce failure diversity among versions, project decisions should aim at causing "diversity" among the faults in the versions. We attempt to clarify what these beliefs mean, in which cases they may be justified and how they can be checked or disproved experimentally

CiteSeerX

City Research Online

Information-theoretic measures of music listening behaviour

Author: Boland Daniel
Murray-Smith Roderick
Publication venue
Publication date: 01/01/2014
Field of study

We present an information-theoretic approach to the mea- surement of users’ music listening behaviour and selection of music features. Existing ethnographic studies of mu- sic use have guided the design of music retrieval systems however are typically qualitative and exploratory in nature. We introduce the SPUD dataset, comprising 10, 000 hand- made playlists, with user and audio stream metadata. With this, we illustrate the use of entropy for analysing music listening behaviour, e.g. identifying when a user changed music retrieval system. We then develop an approach to identifying music features that reflect users’ criteria for playlist curation, rejecting features that are independent of user behaviour. The dataset and the code used to produce it are made available. The techniques described support a quantitative yet user-centred approach to the evaluation of music features and retrieval systems, without assuming objective ground truth labels

CiteSeerX

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten

HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks

Author: Cao Bokai
Mao Mia
Viidu Siim
Yu Philip S.
Publication venue
Publication date: 19/09/2017
Field of study

On electronic game platforms, different payment transactions have different levels of risk. Risk is generally higher for digital goods in e-commerce. However, it differs based on product and its popularity, the offer type (packaged game, virtual currency to a game or subscription service), storefront and geography. Existing fraud policies and models make decisions independently for each transaction based on transaction attributes, payment velocities, user characteristics, and other relevant information. However, suspicious transactions may still evade detection and hence we propose a broad learning approach leveraging a graph based perspective to uncover relationships among suspicious transactions, i.e., inter-transaction dependency. Our focus is to detect suspicious transactions by capturing common fraudulent behaviors that would not be considered suspicious when being considered in isolation. In this paper, we present HitFraud that leverages heterogeneous information networks for collective fraud detection by exploring correlated and fast evolving fraudulent behaviors. First, a heterogeneous information network is designed to link entities of interest in the transaction database via different semantics. Then, graph based features are efficiently discovered from the network exploiting the concept of meta-paths, and decisions on frauds are made collectively on test instances. Experiments on real-world payment transaction data from Electronic Arts demonstrate that the prediction performance is effectively boosted by HitFraud with fast convergence where the computation of meta-path based features is largely optimized. Notably, recall can be improved up to 7.93% and F-score 4.62% compared to baselines.Comment: ICDM 201

arXiv.org e-Print Archive

Crossref

Information-theoretic measures of music listening behaviour

Author: Boland Daniel
Murray-Smith Roderick
Publication venue
Publication date: 01/01/2014
Field of study

Enlighten