26,340 research outputs found

    Compare statistical significance tests for information retrieval evaluation

    Get PDF
    Preprint of our Journal of the Association for Information Science and Technology (JASIST) paper[Abstract] Statistical significance tests can provide evidence that the observed difference in performance between two methods is not due to chance. In Information Retrieval, some studies have examined the validity and suitability of such tests for comparing search systems.We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in Information Retrieval evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have a good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.Ministerio de Econom´ıa y Competitividad; TIN2015-64282-RXunta de Galicia; GPC 2016/035Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/0

    Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors

    Full text link
    Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is the most popular choice among IR researchers. However, previous work has suggested computer intensive tests like the bootstrap or the permutation test, based mainly on theoretical arguments. On empirical grounds, others have suggested non-parametric alternatives such as the Wilcoxon test. Indeed, the question of which tests we should use has accompanied IR and related fields for decades now. Previous theoretical studies on this matter were limited in that we know that test assumptions are not met in IR experiments, and empirical studies were limited in that we do not have the necessary control over the null hypotheses to compute actual Type I and Type II error rates under realistic conditions. Therefore, not only is it unclear which test to use, but also how much trust we should put in them. In contrast to past studies, in this paper we employ a recent simulation methodology from TREC data to go around these limitations. Our study comprises over 500 million p-values computed for a range of tests, systems, effectiveness measures, topic set sizes and effect sizes, and for both the 2-tail and 1-tail cases. Having such a large supply of IR evaluation data with full knowledge of the null hypotheses, we are finally in a position to evaluate how well statistical significance tests really behave with IR data, and make sound recommendations for practitioners.Comment: 10 pages, 6 figures, SIGIR 201

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

    On the Modeling of Musical Solos as Complex Networks

    Full text link
    Notes in a musical piece are building blocks employed in non-random ways to create melodies. It is the "interaction" among a limited amount of notes that allows constructing the variety of musical compositions that have been written in centuries and within different cultures. Networks are a modeling tool that is commonly employed to represent a set of entities interacting in some way. Thus, notes composing a melody can be seen as nodes of a network that are connected whenever these are played in sequence. The outcome of such a process results in a directed graph. By using complex network theory, some main metrics of musical graphs can be measured, which characterize the related musical pieces. In this paper, we define a framework to represent melodies as networks. Then, we provide an analysis on a set of guitar solos performed by main musicians. Results of this study indicate that the presented model can have an impact on audio and multimedia applications such as music classification, identification, e-learning, automatic music generation, multimedia entertainment.Comment: to appear in Information Science, Elsevier. Please cite the paper including such information. arXiv admin note: text overlap with arXiv:1603.0497
    corecore