44 research outputs found

    Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

    Get PDF
    Cumulative Citation Recommendation (CCR) is defined as: given a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. The pipeline encountered in systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), and evaluation. Filtering is only an initial step that reduces the web-scale corpus into a working set of documents more manageable for the subsequent stages. Nevertheless, this step has a large impact on the recall that can be at- tained maximally. This study analyzes in-depth the main factors that affect recall in the filtering stage. We investigate the impact of choices for corpus cleansing, entity profile construction, entity type, document type, and relevance grade. Because failing on recall in this first step of the pipeline cannot be repaired later on, we identify and characterize the citation-worthy documents that do not pass the filtering stage by examining their contents

    Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

    Get PDF
    Cumulative Citation Recommendation (CCR) is defined as: given a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. The pipeline encountered in systems that approach this problem involves four stages: filtering, classification, ranking (or scoring), and evaluation. Filtering is only an initial step that reduces the web-scale corpus into a working set of documents more manageable for the subsequent stages. Nevertheless, this step has a large impact on the recall that can be attained maximally. This study analyzes in-depth the main factors that affect recall in the filtering stage. We investigate the impact of choices for corpus cleansing, entity profile construction, entity type, document type, and relevance grade. Because failing on recall in this first step of the pipeline cannot be repaired later on, we identify and characterize the citation-worthy documents that do not pass the filtering stage by examining their contents

    Random performance differences between online recommender system algorithms

    Get PDF
    In the evaluation of recommender systems, the quality of recommendations made by a newly proposed algorithm is compared to the state-of-the-art, using a given quality measure and dataset. Validity of the evaluation depends on the assumption that the evaluation does not exhibit artefacts resulting from the process of collecting the dataset. The main difference between online and offline evaluation is that in the online setting, the user’s response to a recommendation is only observed once. We used the NewsREEL challenge to gain a deeper understanding of the implications of this difference for making comparisons between different recommender systems. The experiments aim to quantify the expected degree of variation in performance that cannot be attributed to differences between systems. We classify and discuss the non-algorithmic causes of performance differences observed

    Cumulative Citation Recommendation: A Feature-aware Comparisons of Approaches

    Get PDF
    In this work, we conduct a feature-aware comparison of approaches to Cumulative Citation Recommendation (CCR), a task that aims to filter and rank a stream of documents according to their relevance to entities in a knowledge base. We conducted experiments starting with a big feature set, identified a powerful subset and applied it to comparing classification and learning to rank algorithms. With few set of powerful features, we achieve better performance than the state-of-the-art. Surprisingly, our findings challenge the previously known preference of learning-to-rank over classification: in our study, the CCR performance of the classification approach outperforms that using learning-to-rank. This indicates that comparing two approaches is problematic due to the interplay between the approaches themselves and the feature sets one chooses to use

    CWI at TREC 2012, KBA track and Session Track

    Get PDF
    We participated in two tracks: Knowledge Base Acceleration (KBA) Track and Session Track. In the KBA track, we focused on experi- menting with different approaches as it is the first time the track is launched. We experimented with supervised and unsupervised re- trieval models. Our supervised approach models include language models and a string-learning system. Our unsupervised approaches include using: 1)DBpedia labels and 2) Google-Cross-Lingual Dic- tionary (GCLD). While the approach that uses GCLD targets the central and relvant bins, all the rest target the central bin. The GCLD and the string-learning system have outperformed the oth- ers in their respective targeted bins. The goal of the Session track submission is to evaluate whether and how a logic framework for representing user interactions with an IR system can be used for improving the approximation of the relevant term distribution that another system that is supposed to have access to the session infor- mation will then calculate. the documents in the stream corpora. Three out of the seven runs used a Hadoop cluster provide by Sara.nl to process the stream cor- pora. The other 4 runs used a federated access to the same corpora distributed among 7 workstations

    CWI and TU Delft at TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks

    Get PDF
    This paper provides an overview of the work done at the Centrum Wiskunde & Informatica (CWI) and Delft University of Technology (TU Delft) for different tracks of TREC 2013. We participated in the Contextual Suggestion Track, the Federated Web Search Track, the Knowledge Base Acceleration (KBA) Track, and the Web Ad-hoc Track. In the Contextual Suggestion track, we focused on filtering the entire ClueWeb12 collection to generate recommendations according to the provided user profiles and contexts. For the Federated Web Search track, we exploited both categories from ODP and document relevance to merge result lists. In the KBA track, we focused on the Cumulative Citation Recommendation task where we exploited different features to two classification algorithms. For the Web track, we extended an ad-hoc baseline with a proximity model that promotes documents in which the query terms are positioned closer together

    Overview of NewsREEL’16: Multi-dimensional evaluation of real-time stream-recommendation algorithms

    Get PDF
    Successful news recommendation requires facing the challenges of dynamic item sets, contextual item relevance, and of fulfilling non-functional requirements, such as response time. The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to tackle news recommendation and to optimize and evaluate their recommender algorithms both online and offline. In this paper, we summarize the objectives and challenges of NewsREEL 2016. We cover two contrasting perspectives on the challenge: that of the operator (the business providing recommendations) and that of the challenge participant (the researchers developing recommender algorithms). In the intersection of these perspectives, new insights can be gained on how to effectively evaluate real-time stream recommendation algorithms

    Epidemiology of facial fractures: Incidence, prevalence and years lived with disability estimates from the Global Burden of Disease 2017 study

    Get PDF
    Background: The Global Burden of Disease Study (GBD) has historically produced estimates of causes of injury such as falls but not the resulting types of injuries that occur. The objective of this study was to estimate the global incidence, prevalence and years lived with disability (YLDs) due to facial fractures and to estimate the leading injurious causes of facial fracture. Methods: We obtained results from GBD 2017. First, the study estimated the incidence from each injury cause (eg, falls), and then the proportion of each cause that would result in facial fracture being the most disabling injury. Incidence, prevalence and YLDs of facial fractures are then calculated across causes. Results: Globally, in 2017, there were 7 538 663 (95% uncertainty interval 6 116 489 to 9 4

    Anemia prevalence in women of reproductive age in low- and middle-income countries between 2000 and 2018

    Get PDF
    Anemia is a globally widespread condition in women and is associated with reduced economic productivity and increased mortality worldwide. Here we map annual 2000–2018 geospatial estimates of anemia prevalence in women of reproductive age (15–49 years) across 82 low- and middle-income countries (LMICs), stratify anemia by severity and aggregate results to policy-relevant administrative and national levels. Additionally, we provide subnational disparity analyses to provide a comprehensive overview of anemia prevalence inequalities within these countries and predict progress toward the World Health Organization’s Global Nutrition Target (WHO GNT) to reduce anemia by half by 2030. Our results demonstrate widespread moderate improvements in overall anemia prevalence but identify only three LMICs with a high probability of achieving the WHO GNT by 2030 at a national scale, and no LMIC is expected to achieve the target in all their subnational administrative units. Our maps show where large within-country disparities occur, as well as areas likely to fall short of the WHO GNT, offering precision public health tools so that adequate resource allocation and subsequent interventions can be targeted to the most vulnerable populations.Peer reviewe

    Mapping geographical inequalities in childhood diarrhoeal morbidity and mortality in low-income and middle-income countries, 2000–17 : analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background Across low-income and middle-income countries (LMICs), one in ten deaths in children younger than 5 years is attributable to diarrhoea. The substantial between-country variation in both diarrhoea incidence and mortality is attributable to interventions that protect children, prevent infection, and treat disease. Identifying subnational regions with the highest burden and mapping associated risk factors can aid in reducing preventable childhood diarrhoea. Methods We used Bayesian model-based geostatistics and a geolocated dataset comprising 15 072 746 children younger than 5 years from 466 surveys in 94 LMICs, in combination with findings of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017, to estimate posterior distributions of diarrhoea prevalence, incidence, and mortality from 2000 to 2017. From these data, we estimated the burden of diarrhoea at varying subnational levels (termed units) by spatially aggregating draws, and we investigated the drivers of subnational patterns by creating aggregated risk factor estimates. Findings The greatest declines in diarrhoeal mortality were seen in south and southeast Asia and South America, where 54·0% (95% uncertainty interval [UI] 38·1–65·8), 17·4% (7·7–28·4), and 59·5% (34·2–86·9) of units, respectively, recorded decreases in deaths from diarrhoea greater than 10%. Although children in much of Africa remain at high risk of death due to diarrhoea, regions with the most deaths were outside Africa, with the highest mortality units located in Pakistan. Indonesia showed the greatest within-country geographical inequality; some regions had mortality rates nearly four times the average country rate. Reductions in mortality were correlated to improvements in water, sanitation, and hygiene (WASH) or reductions in child growth failure (CGF). Similarly, most high-risk areas had poor WASH, high CGF, or low oral rehydration therapy coverage. Interpretation By co-analysing geospatial trends in diarrhoeal burden and its key risk factors, we could assess candidate drivers of subnational death reduction. Further, by doing a counterfactual analysis of the remaining disease burden using key risk factors, we identified potential intervention strategies for vulnerable populations. In view of the demands for limited resources in LMICs, accurately quantifying the burden of diarrhoea and its drivers is important for precision public health
    corecore