28,620 research outputs found

    Unbiased Comparative Evaluation of Ranking Functions

    Full text link
    Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing kk systems against a baseline, and ranking kk systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

    Monte Carlo methods in PageRank computation: When one iteration is sufficient

    Get PDF
    PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. Google computes the PageRank using the power iteration method which requires about one week of intensive computations. In the present work we propose and analyze Monte Carlo type methods for the PageRank computation. There are several advantages of the probabilistic Monte Carlo methods over the deterministic power iteration method: Monte Carlo methods provide good estimation of the PageRank for relatively important pages already after one iteration; Monte Carlo methods have natural parallel implementation; and finally, Monte Carlo methods allow to perform continuous update of the PageRank as the structure of the Web changes

    Estimation of Parameters in DNA Mixture Analysis

    Full text link
    In Cowell et al. (2007), a Bayesian network for analysis of mixed traces of DNA was presented using gamma distributions for modelling peak sizes in the electropherogram. It was demonstrated that the analysis was sensitive to the choice of a variance factor and hence this should be adapted to any new trace analysed. In the present paper we discuss how the variance parameter can be estimated by maximum likelihood to achieve this. The unknown proportions of DNA from each contributor can similarly be estimated by maximum likelihood jointly with the variance parameter. Furthermore we discuss how to incorporate prior knowledge about the parameters in a Bayesian analysis. The proposed estimation methods are illustrated through a few examples of applications for calculating evidential value in casework and for mixture deconvolution

    Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory

    Full text link
    The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score, the systematic improvement probability (SIP), based on the direct system-wise comparison of absolute errors. Independently of the chosen scoring rule, the uncertainty of the statistics due to the incompleteness of the benchmark data sets is also generally overlooked. However, this uncertainty is essential to appreciate the robustness of rankings. In the present article, we develop two indicators based on robust statistics to address this problem: P_{inv}, the inversion probability between two values of a statistic, and \mathbf{P}_{r}, the ranking probability matrix. We demonstrate also the essential contribution of the correlations between error sets in these scores comparisons

    A practical guide and software for analysing pairwise comparison experiments

    Get PDF
    Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.Comment: Code available at https://github.com/mantiuk/pwcm

    Distribution of the very first PopIII stars and their relation to bright z~6 quasars

    Full text link
    We discuss the link between dark matter halos hosting the first PopIII stars and the rare, massive, halos that are generally considered to host bright quasars at high redshift z~6. The main question that we intend to answer is whether the super-massive black holes powering these QSOs grew out from the seeds planted by the first intermediate massive black holes created in the universe. This question involves a dynamical range of 10^13 in mass and we address it by combining N-body simulations of structure formation to identify the most massive halos at z~6 with a Monte Carlo method based on linear theory to obtain the location and formation times of the first light halos within the whole simulation box. We show that the descendants of the first ~10^6 Msun virialized halos do not, on average, end up in the most massive halos at z~6, but rather live in a large variety of environments. The oldest PopIII progenitors of the most massive halos at z~6, form instead from density peaks that are on average one and a half standard deviations more common than the first PopIII star formed in the volume occupied by one bright high-z QSO. The intermediate mass black hole seeds planted by the very first PopIII stars at z>40 can easily grow to masses m_BH>10^9.5 Msun by z=6 assuming Eddington accretion with radiative efficiency \epsilon~0.1. Quenching of the black hole accretion is therefore crucial to avoid an overabundance of supermassive black holes at lower redshift. This can be obtained if the mass accretion is limited to a fraction \eta~6*10^{-3} of the total baryon mass of the halo hosting the black hole. The resulting high end slope of the black hole mass function at z=6 is \alpha ~ -3.7, a value within the 1\sigma error bar for the bright end slope of the observed quasar luminosity function at z=6.Comment: 30 pages, 9 figures, ApJ accepte
    • …
    corecore