1,326 research outputs found

    A Charge to Keep

    Get PDF

    Relevance thresholds in system evaluations

    Get PDF
    We introduce and explore the concept of an individual's relevance threshold as a way of reconciling differences in outcomes between batch and user experiments

    User performance versus precision measures for simple search tasks

    Get PDF
    Several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as SIGIR and TREC do not translate into a benefit for users. Two of the studies used an instance recall task, and a third used a question answering task, so perhaps it is unsurprising that the precision based measures of IR system effectiveness on one-shot query evaluation do not correlate with user performance on these tasks. In this study, we evaluate two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a single document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of relevant documents that users can identify within five minutes. Users employ search engines with controlled mean average precision (MAP) of between 55% and 95%. Our results show that there is no significant relationship between system effectiveness measured by MAP and the precision-based task. A significant, but weak relationship is present for the precision at one document returned metric. A weak relationship is present between MAP and the simple recall-based task

    Examining the pseudo-standard web search engine results page

    Get PDF
    Nearly every web search engine presents its results in an identical format: a ranked list of web page summaries. Each summary comprises a title; some sentence fragments usually containing words used in the query; and URL information about the page. In this study we present data from our pilot experiments with eye tracking equipment to examine how users interact with this standard list of results as presented by the Australian sensis.com.au web search service. In particular, we observe: different behaviours for navigational and informational queries; that users generally scan the list top to bottom; and that eyes rarely wander from the left of the page. We also attempt to correlate the number of bold words (query words) in a summary with the amount of time spent reading the summary. Unfortunately there is no substantial correlation, and so studies relying heavily on this assumption in the literature should be treated with caution

    A successful Iowa shed roof poultry house

    Get PDF
    After putting it to practical test on the Iowa State College poultry farm, the Iowa Agricultural Experiment Station recommends the permanent shed roof poultry house described in this bulletin. This house was planned and built by the Poultry and Agricultural Engineering sections of the station for housing 75 or more laying hens kept under Iowa conditions. It has given good results in practical use at the poultry farm and on this basis it is recommended to those who have need of a poultry house of this type

    On crowdsourcing relevance magnitudes for information retrieval evaluation

    Get PDF
    4siMagnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents for information retrieval evaluation, carrying out a large-scale user study across 18 TREC topics and collecting over 50,000 magnitude estimation judgments using crowdsourcing. Our analysis shows that magnitude estimation judgments can be reliably collected using crowdsourcing, are competitive in terms of assessor cost, and are, on average, rank-aligned with ordinal judgments made by expert relevance assessors. We explore the application of magnitude estimation for IR evaluation, calibrating two gain-based effectiveness metrics, nDCG and ERR, directly from user-reported perceptions of relevance. A comparison of TREC system effectiveness rankings based on binary, ordinal, and magnitude estimation relevance shows substantial variation; in particular, the top systems ranked using magnitude estimation and ordinal judgments differ substantially. Analysis of the magnitude estimation scores shows that this effect is due in part to varying perceptions of relevance: different users have different perceptions of the impact of relative differences in document relevance. These results have direct implications for IR evaluation, suggesting that current assumptions about a single view of relevance being sufficient to represent a population of users are unlikely to hold.partially_openopenMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, AndrewMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, Andre

    Self-Organizing Service Ecosystems: Exploring a New Concept for Service Science

    Get PDF
    The rapid advancements on digital technologies have positioned digital transformation as a central topic of interest to information systems (IS) researchers. However, our understanding of the nature, extent and dynamics of digital service ecosystems remains limited. This short paper contributes to IS and service science research by introducing the conceptualization of self-organizing service ecosystem as an analytical lens for understanding digital transformative phenomena in service ecosystems. To achieve this, we draw on the most recent narrative of value co-creation from service-dominant logic and on key definitions from the theory of self-organization. This paper also discusses future research directions emphasizing on the role and impact of technology in self-organizing service ecosystems

    Structure–Function Mapping: Variability and Conviction in Tracing Retinal Nerve Fiber Bundles and Comparison to a Computational Model

    Get PDF
    yesPurpose: We evaluated variability and conviction in tracing paths of retinal nerve fiber bundles (RNFBs) in retinal images, and compared traced paths to a computational model that produces anatomically-customized structure–function maps. Methods: Ten retinal images were overlaid with 24-2 visual field locations. Eight clinicians and 6 naïve observers traced RNFBs from each location to the optic nerve head (ONH), recording their best estimate and certain range of insertion. Three clinicians and 2 naïve observers traced RNFBs in 3 images, 3 times, 7 to 19 days apart. The model predicted 10° ONH sectors relating to each location. Variability and repeatability in best estimates, certain range width, and differences between best estimates and model-predictions were evaluated. Results: Median between-observer variability in best estimates was 27° (interquartile range [IQR] 20°–38°) for clinicians and 33° (IQR 22°–50°) for naïve observers. Median certain range width was 30° (IQR 14°–45°) for clinicians and 75° (IQR 45°–180°) for naïve observers. Median repeatability was 10° (IQR 5°–20°) for clinicians and 15° (IQR 10°–29°) for naïve observers. All measures were worse further from the ONH. Systematic differences between model predictions and best estimates were negligible; median absolute differences were 17° (IQR 9°–30°) for clinicians and 20° (IQR 10°–36°) for naïve observers. Larger departures from the model coincided with greater variability in tracing. Conclusions: Concordance between the model and RNFB tracing was good, and greatest where tracing variability was lowest. When RNFB tracing is used for structure–function mapping, variability should be considered
    • …
    corecore