Search CORE

27,498 research outputs found

On crowdsourcing relevance magnitudes for information retrieval evaluation

Author: Maddalena E
Mizzaro S
Scholer F
Turpin A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

4siMagnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents for information retrieval evaluation, carrying out a large-scale user study across 18 TREC topics and collecting over 50,000 magnitude estimation judgments using crowdsourcing. Our analysis shows that magnitude estimation judgments can be reliably collected using crowdsourcing, are competitive in terms of assessor cost, and are, on average, rank-aligned with ordinal judgments made by expert relevance assessors. We explore the application of magnitude estimation for IR evaluation, calibrating two gain-based effectiveness metrics, nDCG and ERR, directly from user-reported perceptions of relevance. A comparison of TREC system effectiveness rankings based on binary, ordinal, and magnitude estimation relevance shows substantial variation; in particular, the top systems ranked using magnitude estimation and ordinal judgments differ substantially. Analysis of the magnitude estimation scores shows that this effect is due in part to varying perceptions of relevance: different users have different perceptions of the impact of relative differences in document relevance. These results have direct implications for IR evaluation, suggesting that current assumptions about a single view of relevance being sufficient to represent a population of users are unlikely to hold.partially_openopenMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, AndrewMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, Andre

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

RMIT Research Repository

Judgments of effort exerted by others are influenced by received rewards

Author: Kagan Igor
Pannach Franziska
Pooresmaeili Arezoo
Rollwage Max
Stinson Caedyn
Toelch Ulf
Publication venue
Publication date: 01/01/2020
Field of study

Estimating invested effort is a core dimension for evaluating own and others’ actions, and views on the relationship between effort and rewards are deeply ingrained in various societal attitudes. Internal representations of effort, however, are inherently noisy, e.g. due to the variability of sensorimotor and visceral responses to physical exertion. The uncertainty in effort judgments is further aggravated when there is no direct access to the internal representations of exertion – such as when estimating the effort of another person. Bayesian cue integration suggests that this uncertainty can be resolved by incorporating additional cues that are predictive of effort, e.g. received rewards. We hypothesized that judgments about the effort spent on a task will be influenced by the magnitude of received rewards. Additionally, we surmised that such influence might further depend on individual beliefs regarding the relationship between hard work and prosperity, as exemplified by a conservative work ethic. To test these predictions, participants performed an effortful task interleaved with a partner and were informed about the obtained reward before rating either their own or the partner’s effort. We show that higher rewards led to higher estimations of exerted effort in self-judgments, and this effect was even more pronounced for other-judgments. In both types of judgment, computational modelling revealed that reward information and sensorimotor markers of exertion were combined in a Bayes-optimal manner in order to reduce uncertainty. Remarkably, the extent to which rewards influenced effort judgments was associated with conservative world-views, indicating links between this phenomenon and general beliefs about the relationship between effort and earnings in society

Institutional Repository of the Freie Universität Berlin

Southampton (e-Prints Soton)

MPG.PuRe

Crowdsourcing Relevance: Two Studies on Assessment

Author: Maddalena Eddy
Publication venue: Università degli Studi di Udine
Publication date: 03/04/2017
Field of study

Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In this thesis, we focus on some specific aspects related to time, scale, and agreement. First, we address the issue of the time factor in gathering relevance label: we study how much time the judges need to assess documents. We conduct a series of four experiments which unexpectedly reveal us how introducing time limitations leads to benefits in terms of the quality of the results. Furthermore, we discuss strategies aimed to determine the right amount of time to make available to the workers for the relevance assessment, in order to both guarantee the high quality of the gathered results and the saving of the valuable resources of time and money. Then we explore the application of magnitude estimation, a psychophysical scaling technique for the measurement of sensation, for relevance assessment. We conduct a large-scale user study across 18 TREC topics, collecting more than 50,000 magnitude estimation judgments, which result to be overall rank-aligned with ordinal judgments made by expert relevance assessors. We discuss the benefits, the reliability of the judgements collected, and the competitiveness in terms of assessor cost. We also report some preliminary results on the agreement among judges. Often, the results of crowdsourcing experiments are affected by noise, that can be ascribed to lack of agreement among workers. This aspect should be considered as it can affect the reliability of the gathered relevance labels, as well as the overall repeatability of the experiments.openDottorato di ricerca in Informatica e scienze matematiche e fisicheopenMaddalena, Edd

Archivio istituzionale della ricerca - Università degli Studi di Udine

Unbiased Comparative Evaluation of Ranking Functions

Author: Owen A. B.
Pavlu V.
Peng Ye D. D.
Sparck-Jones K.
Voorhees E. M.
Yuan C.
Zhao P.
Publication venue
Publication date: 25/04/2016
Field of study

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing

k

systems against a baseline, and ranking

k

systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

arXiv.org e-Print Archive

Crossref

To aid or not to aid: Foreign aid and productivity in cross-country regressions

Author: Pablo Selaya
Publication venue
Publication date
Field of study

The paper reexamines empirically the robustness of competing theories of foreign aid effectiveness. By shifting the focus from the effects of aid on income to effects of aid on productivity, it is possible to put to test 3 existing theories of foreign aid effectiveness. The results provide support for the hypotheses that (i) aid has a positive effect in fostering growth of average productivity, (ii) aid doesn't operate with diminishing returns, and (iii) the magnitude of the total effect depends on climate-related circumstances. The results support the policy recommendation previously made in the literature to seriously reconsider the conditionality rule for foreign aid disbursements.Foreign Aid, cross-country, conditionality

Research Papers in Economics

Dim galaxies and outer halos of galaxies missed by 2MASS ? The near-infrared luminosity function and density

Author: Andreon
Andreon
Andreon
Bertin
Blanton
Bromley
Cole
Cole
Cowie
de Propris
Folkes
Garilli
Jarrett
Kochanek
Kron
Lin
Loveday
Pahre
S. Andreon
Sandage
Vellotani
Wright
Publication venue: 'EDP Sciences'
Publication date: 10/11/2001
Field of study

By using high-resolution and deep Ks band observations of early-type galaxies of the nearby Universe and of a cluster at z=0.3 we show that the two luminosity functions (LFs) of the local universe derived from 2MASS data miss a fair fraction of the flux of the galaxies (more than 20 to 30%) and a whole population of galaxies of central brightness fainter than the isophote used for detection, but bright enough to be included in the published LFs. In particular, the fraction of lost flux increases as the galaxy surface brightness become fainter. Therefore, the so far derived LF slopes and characteristic luminosity as well as luminosity density are underestimated. Other published near-infrared LFs miss flux in general, including the LF of the distant field computed in a 3 arcsec aperture.Comment: A&A in pres

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Do the Measurements of Financial Market Inflation Expectations Yield Relevant Macroeconomic Information?

Author: Martin Fukaè
Publication venue
Publication date
Field of study

Monthly data concerning the inflation expectations of financial analysts in the Czech Republic exhibit a tendency for bias and ineffectiveness. This paper analyses, from a macroeconomic perspective, whether the surveyed data include any relevant macroeconomic information, specifically, whether the surveyed expectations correspond to market expectations considered in macroeconomic analysis and models. Using a methodology based on a simple Fisher rule, it is found that the difference between the surveyed and market inflation expectations is not statistically significant. From this perspective, it is concluded the surveyed inflation expectations bear economically relevant information.market inflation expectations, surveyed inflation expectations, Fisher rule

Research Papers in Economics

Assessing the Magnitude of the Concentration Parameter in a Simultaneous Equations Model

Author: C. L. Skeels
D. S. Poskitt
Publication venue
Publication date
Field of study

Poskitt and Skeels (2003) provide a new approximation to the sampling distribution of the IV estimator in a simultaneous equations model. This approximation is appropriate when the concentration parameter associated with the reduced form model is small and a basic purpose of this paper is to provide the practitioner with a method of ascertaining when the concentration parameter is small, and hence when the use of the Poskitt and Skeels (2003) approximation is appropriate. Existing procedures tend to focus on the notion of correlation and hypothesis testing. Approaching the problem from a different perspective leads us to advocate a different statistic for use in this problem. We provide exact and approximate distribution theory for the proposed statistic and show that it satisfies various optimality criteria not satisfied by some of its competitors. Rather than adopting a testing approach we suggest the use of p-values as a calibration device.Concentration parameter, simultaneous equations model, alienation coefficient, Wilks-lambda distribution, admissible invariant test.

Research Papers in Economics