13,073 research outputs found
Closing the loop: assisting archival appraisal and information retrieval in one sweep
In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval
Understanding customers' holistic perception of switches in automotive human–machine interfaces
For successful new product development, it is necessary to understand the customers' holistic experience of the product beyond traditional task completion, and acceptance measures. This paper describes research in which ninety-eight UK owners of luxury saloons assessed the feel of push-switches in five luxury saloon cars both in context (in-car) and out of context (on a bench). A combination of hedonic data (i.e. a measure of ‘liking’), qualitative data and semantic differential data was collected. It was found that customers are clearly able to differentiate between switches based on the degree of liking for the samples' perceived haptic qualities, and that the assessment environment had a statistically significant effect, but that it was not universal. A factor analysis has shown that perceived characteristics of switch haptics can be explained by three independent factors defined as ‘Image’, ‘Build Quality’, and ‘Clickiness’. Preliminary steps have also been taken towards identifying whether existing theoretical frameworks for user experience may be applicable to automotive human–machine interfaces
Analysis of change in users' assessment of search results over time
We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated two different queries and four diverse result sets twice with an interval of two months. To analyse the results we investigate whether two types of patterns of user behaviour from the theory of categorical thinking hold for the case of evaluation of search results: (1) coarseness and (2) locality. To quantify these patterns we devised two new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and non-local changes. Two types of judgements were considered in this study: 1) relevance on a 4-point scale, and 2) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local
Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments
The creation of relevance assessments by human assessors (often nowadays
crowdworkers) is a vital step when building IR test collections. Prior works
have investigated assessor quality & behaviour, though into the impact of a
document's presentation modality on assessor efficiency and effectiveness.
Given the rise of voice-based interfaces, we investigate whether it is feasible
for assessors to judge the relevance of text documents via a voice-based
interface. We ran a user study (n = 49) on a crowdsourcing platform where
participants judged the relevance of short and long documents sampled from the
TREC Deep Learning corpus-presented to them either in the text or voice
modality. We found that: (i) participants are equally accurate in their
judgements across both the text and voice modality; (ii) with increased
document length it takes participants significantly longer (for documents of
length > 120 words it takes almost twice as much time) to make relevance
judgements in the voice condition; and (iii) the ability of assessors to ignore
stimuli that are not relevant (i.e., inhibition) impacts the assessment quality
in the voice modality-assessors with higher inhibition are significantly more
accurate than those with lower inhibition. Our results indicate that we can
reliably leverage the voice modality as a means to effectively collect
relevance labels from crowdworkers.Comment: Accepted at SIGIR 202
- …