13,073 research outputs found

    Closing the loop: assisting archival appraisal and information retrieval in one sweep

    Get PDF
    In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval

    Understanding customers' holistic perception of switches in automotive human–machine interfaces

    Get PDF
    For successful new product development, it is necessary to understand the customers' holistic experience of the product beyond traditional task completion, and acceptance measures. This paper describes research in which ninety-eight UK owners of luxury saloons assessed the feel of push-switches in five luxury saloon cars both in context (in-car) and out of context (on a bench). A combination of hedonic data (i.e. a measure of ‘liking’), qualitative data and semantic differential data was collected. It was found that customers are clearly able to differentiate between switches based on the degree of liking for the samples' perceived haptic qualities, and that the assessment environment had a statistically significant effect, but that it was not universal. A factor analysis has shown that perceived characteristics of switch haptics can be explained by three independent factors defined as ‘Image’, ‘Build Quality’, and ‘Clickiness’. Preliminary steps have also been taken towards identifying whether existing theoretical frameworks for user experience may be applicable to automotive human–machine interfaces

    Analysis of change in users' assessment of search results over time

    Get PDF
    We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated two different queries and four diverse result sets twice with an interval of two months. To analyse the results we investigate whether two types of patterns of user behaviour from the theory of categorical thinking hold for the case of evaluation of search results: (1) coarseness and (2) locality. To quantify these patterns we devised two new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and non-local changes. Two types of judgements were considered in this study: 1) relevance on a 4-point scale, and 2) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local

    Conceptualising and interpreting reliability

    Get PDF

    Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

    Full text link
    The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality & behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.Comment: Accepted at SIGIR 202
    • …
    corecore