2,585 research outputs found

    Repeatability Corner Cases in Document Ranking: The Impact of Score Ties

    Full text link
    Document ranking experiments should be repeatable. However, the interaction between multi-threaded indexing and score ties during retrieval may yield non-deterministic rankings, making repeatability not as trivial as one might imagine. In the context of the open-source Lucene search engine, score ties are broken by internal document ids, which are assigned at index time. Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably. This short paper examines the effectiveness impact of such score ties, quantifying the variability that can be attributed to this phenomenon. The obvious solution to this non-determinism and to ensure repeatable document ranking is to break score ties using external collection document ids. This approach, however, comes with measurable efficiency costs due to the necessity of consulting external identifiers during query evaluation.Comment: Published in the Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019

    Training Curricula for Open Domain Answer Re-Ranking

    Full text link
    In precision-oriented tasks like answer ranking, it is more important to rank many relevant answers highly than to retrieve all relevant answers. It follows that a good ranking strategy would be to learn how to identify the easiest correct answers first (i.e., assign a high ranking score to answers that have characteristics that usually indicate relevance, and a low ranking score to those with characteristics that do not), before incorporating more complex logic to handle difficult cases (e.g., semantic matching or reasoning). In this work, we apply this idea to the training of neural answer rankers using curriculum learning. We propose several heuristics to estimate the difficulty of a given training sample. We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process. As the training process progresses, our approach gradually shifts to weighting all samples equally, regardless of difficulty. We present a comprehensive evaluation of our proposed idea on three answer ranking datasets. Results show that our approach leads to superior performance of two leading neural ranking architectures, namely BERT and ConvKNRM, using both pointwise and pairwise losses. When applied to a BERT-based ranker, our method yields up to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model trained without a curriculum). This results in models that can achieve comparable performance to more expensive state-of-the-art techniques.Comment: Accepted at SIGIR 2020 (long

    Proceedings of Mathsport international 2017 conference

    Get PDF
    Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017. MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet. Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports

    ir_metadata: An Extensible Metadata Schema for IR Experiments

    Full text link
    The information retrieval (IR) community has a strong tradition of making the computational artifacts and resources available for future reuse, allowing the validation of experimental results. Besides the actual test collections, the underlying run files are often hosted in data archives as part of conferences like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide much information about the underlying experiment. For instance, the single run file is not of much use without the context of the shared task's website or the run data archive. In other domains, like the social sciences, it is good practice to annotate research data with metadata. In this work, we introduce ir_metadata - an extensible metadata schema for TREC run files based on the PRIMAD model. We propose to align the metadata annotations to PRIMAD, which considers components of computational experiments that can affect reproducibility. Furthermore, we outline important components and information that should be reported in the metadata and give evidence from the literature. To demonstrate the usefulness of these metadata annotations, we implement new features in repro_eval that support the outlined metadata schema for the use case of reproducibility studies. Additionally, we curate a dataset with run files derived from experiments with different instantiations of PRIMAD components and annotate these with the corresponding metadata. In the experiments, we cover reproducibility experiments that are identified by the metadata and classified by PRIMAD. With this work, we enable IR researchers to annotate TREC run files and improve the reuse value of experimental artifacts even further.Comment: Resource pape

    Information Retrieval: Recent Advances and Beyond

    Full text link
    In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain

    Improving the process of analysis and comparison of results in dependability benchmarks for computer systems

    Full text link
    Tesis por compendioLos dependability benchmarks (o benchmarks de confiabilidad en español), están diseñados para evaluar, mediante la categorización cuantitativa de atributos de confiabilidad y prestaciones, el comportamiento de sistemas en presencia de fallos. En este tipo de benchmarks, donde los sistemas se evalúan en presencia de perturbaciones, no ser capaces de elegir el sistema que mejor se adapta a nuestras necesidades puede, en ocasiones, conllevar graves consecuencias (económicas, de reputación, o incluso de pérdida de vidas). Por esa razón, estos benchmarks deben cumplir ciertas propiedades, como son la no-intrusión, la representatividad, la repetibilidad o la reproducibilidad, que garantizan la robustez y precisión de sus procesos. Sin embargo, a pesar de la importancia que tiene la comparación de sistemas o componentes, existe un problema en el ámbito del dependability benchmarking relacionado con el análisis y la comparación de resultados. Mientras que el principal foco de investigación se ha centrado en el desarrollo y la mejora de procesos para obtener medidas en presencia de fallos, los aspectos relacionados con el análisis y la comparación de resultados quedaron mayormente desatendidos. Esto ha dado lugar a diversos trabajos en este ámbito donde el proceso de análisis y la comparación de resultados entre sistemas se realiza de forma ambigua, mediante argumentación, o ni siquiera queda reflejado. Bajo estas circunstancias, a los usuarios de los benchmarks se les presenta una dificultad a la hora de utilizar estos benchmarks y comparar sus resultados con los obtenidos por otros usuarios. Por tanto, extender la aplicación de los benchmarks de confiabilidad y realizar la explotación cruzada de resultados es una tarea actualmente poco viable. Esta tesis se ha centrado en el desarrollo de una metodología para dar soporte a los desarrolladores y usuarios de benchmarks de confiabilidad a la hora de afrontar los problemas existentes en el análisis y comparación de resultados. Diseñada para asegurar el cumplimiento de las propiedades de estos benchmarks, la metodología integra el proceso de análisis de resultados en el flujo procedimental de los benchmarks de confiabilidad. Inspirada en procedimientos propios del ámbito de la investigación operativa, esta metodología proporciona a los evaluadores los medios necesarios para hacer su proceso de análisis explícito, y más representativo para el contexto dado. Los resultados obtenidos de aplicar esta metodología en varios casos de estudio de distintos dominios de aplicación, mostrará las contribuciones de este trabajo a mejorar el proceso de análisis y comparación de resultados en procesos de evaluación de la confiabilidad para sistemas basados en computador.Dependability benchmarks are designed to assess, by quantifying through quantitative performance and dependability attributes, the behavior of systems in presence of faults. In this type of benchmarks, where systems are assessed in presence of perturbations, not being able to select the most suitable system may have serious implications (economical, reputation or even lost of lives). For that reason, dependability benchmarks are expected to meet certain properties, such as non-intrusiveness, representativeness, repeatability or reproducibility, that guarantee the robustness and accuracy of their process. However, despite the importance that comparing systems or components has, there is a problem present in the field of dependability benchmarking regarding the analysis and comparison of results. While the main focus in this field of research has been on developing and improving experimental procedures to obtain the required measures in presence of faults, the processes involving the analysis and comparison of results were mostly unattended. This has caused many works in this field to analyze and compare results of different systems in an ambiguous way, as the process followed in the analysis is based on argumentation, or not even present. Hence, under these circumstances, benchmark users will have it difficult to use these benchmarks and compare their results with those from others. Therefore extending the application of these dependability benchmarks and perform cross-exploitation of results among works is not likely to happen. This thesis has focused on developing a methodology to assist dependability benchmark performers to tackle the problems present in the analysis and comparison of results of dependability benchmarks. Designed to guarantee the fulfillment of dependability benchmark's properties, this methodology seamlessly integrates the process of analysis of results within the procedural flow of a dependability benchmark. Inspired on procedures taken from the field of operational research, this methodology provides evaluators with the means not only to make their process of analysis explicit to anyone, but also more representative for the context being. The results obtained from the application of this methodology to several case studies in different domains, will show the actual contributions of this work to improving the process of analysis and comparison of results in dependability benchmarking for computer systems.Els dependability benchmarks (o benchmarks de confiabilitat, en valencià), són dissenyats per avaluar, mitjançant la categorització quantitativa d'atributs de confiabilitat i prestacions, el comportament de sistemes en presència de fallades. En aquest tipus de benchmarks, on els sistemes són avaluats en presència de pertorbacions, el no ser capaços de triar el sistema que millor s'adapta a les nostres necessitats pot tenir, de vegades, greus conseqüències (econòmiques, de reputació, o fins i tot pèrdua de vides). Per aquesta raó, aquests benchmarks han de complir certes propietats, com són la no-intrusió, la representativitat, la repetibilitat o la reproductibilitat, que garanteixen la robustesa i precisió dels seus processos. Així i tot, malgrat la importància que té la comparació de sistemes o components, existeix un problema a l'àmbit del dependability benchmarking relacionat amb l'anàlisi i la comparació de resultats. Mentre que el principal focus d'investigació s'ha centrat en el desenvolupament i la millora de processos per a obtenir mesures en presència de fallades, aquells aspectes relacionats amb l'anàlisi i la comparació de resultats es van desatendre majoritàriament. Açò ha donat lloc a diversos treballs en aquest àmbit on els processos d'anàlisi i comparació es realitzen de forma ambigua, mitjançant argumentació, o ni tan sols queden reflectits. Sota aquestes circumstàncies, als usuaris dels benchmarks se'ls presenta una dificultat a l'hora d'utilitzar aquests benchmarks i comparar els seus resultats amb els obtinguts per altres usuaris. Per tant, estendre l'aplicació dels benchmarks de confiabilitat i realitzar l'explotació creuada de resultats és una tasca actualment poc viable. Aquesta tesi s'ha centrat en el desenvolupament d'una metodologia per a donar suport als desenvolupadors i usuaris de benchmarks de confiabilitat a l'hora d'afrontar els problemes existents a l'anàlisi i comparació de resultats. Dissenyada per a assegurar el compliment de les propietats d'aquests benchmarks, la metodologia integra el procés d'anàlisi de resultats en el flux procedimental dels benchmarks de confiabilitat. Inspirada en procediments propis de l'àmbit de la investigació operativa, aquesta metodologia proporciona als avaluadors els mitjans necessaris per a fer el seu procés d'anàlisi explícit, i més representatiu per al context donat. Els resultats obtinguts d'aplicar aquesta metodologia en diversos casos d'estudi de distints dominis d'aplicació, mostrarà les contribucions d'aquest treball a millorar el procés d'anàlisi i comparació de resultats en processos d'avaluació de la confiabilitat per a sistemes basats en computador.Martínez Raga, M. (2018). Improving the process of analysis and comparison of results in dependability benchmarks for computer systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/111945TESISCompendi

    Ranking for Web Data Search Using On-The-Fly Data Integration

    Get PDF
    Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines

    Persistent social isolation reflects identity and social context but not maternal effects or early environment

    Get PDF
    This is the final version of the article. Available from Springer Nature via the DOI in this record.Individuals who are well integrated into society have greater access to resources and tend to live longer. Why some individuals are socially isolated and others are not is therefore puzzling from an evolutionary perspective. Answering this question requires establishing the mix of intrinsic and contextual factors that contribute to social isolation. Using social network data spanning up to half of the median adult lifespan in a gregarious primate, we found that some measures of social isolation were modestly repeatable within individuals, consistent with a trait. By contrast, social isolation was not explained by the identity of an animal’s mother or the group into which it was born. Nevertheless, age, sex and social status each played a role, as did kin dynamics and familiarity. Females with fewer close relatives were more isolated, and the more time males spent in a new group the less isolated they became, independent of their social status. These results show that social isolation results from a combination of intrinsic and environmental factors. From an evolutionary perspective, these findings suggest that social isolation could be adaptive in some contexts and partly maintained by selection.This work was supported by National Institute of Mental Health grants R01-MH089484 and R01-MH096875, and an Incubator Award from the Duke Institute for Brain Sciences. L.J.N.B. was supported by an Early Career Fellowship from the Leverhulme Trust. Data provided by the University of Puerto Rico, and its facilities, are funded by grant number 2 P40 OD012217 from the Office of Research Infrastructure Programs (ORIP) of the National Institutes of Health (NIH)

    The Use of Multi-Criteria Evaluation and Network Analysis in the Area Development Planning Process

    Get PDF
    The purpose of this research was to develop improvements to the area development planning process. These plans are used to improve operations within an installation sub-section by altering the physical layout of facilities. One methodology was developed based on apply network analysis concepts to the generation of alternative facility layouts. A second methodology was developed using multi-criteria evaluation to the selection of evaluated alternative facility layouts. The results of this study are two methodological processes that can be executed at base level requiring minimal additional information. The functional network map, based in network analysis, increases the use of socio-economic data into the generation of alterative facility layouts. The alternative layout scoring process, base in multi-criteria evaluation, returns a quantitative score for each alternative layout and a relative ranking. The use of these methodologies as decision support tools reduces the subjectivity of the current process and increases the repeatability of results
    corecore