4,565 research outputs found

    Unbiased Comparative Evaluation of Ranking Functions

    Full text link
    Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing kk systems against a baseline, and ranking kk systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Full text link
    Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

    Estimating Accuracy of Personal Identifiable Information in Integrated Data Systems

    Get PDF
    Without a valid assessment of accuracy there is a risk of data users coming to incorrect conclusions or making bad decision based on inaccurate data. This dissertation proposes a theoretical method for developing data-accuracy metrics specific for any given person-centric integrated system and how a data analyst can use these metrics to estimate the overall accuracy of person-centric data. Estimating the accuracy of Personal Identifiable Information (PII) creates a corresponding need to model and formalize PII for both the real-world and electronic data, in a way that supports rigorous reasoning relative to real-world facts, expert opinions, and aggregate knowledge. This research provides such a foundation by introducing a temporal first-order logic language (FOL), called Person Data First-order Logic (PDFOL). With its syntax and semantics formalized, PDFOL provides a mechanism for expressing data- accuracy metrics, computing measurements using these metrics on person-centric databases, and comparing those measurements with expected values from real-world populations. Specifically, it enables data analysts to model person attributes and inter-person relations from real-world population or database representations of such, as well as real-world facts, expert opinions, and aggregate knowledge. PDFOL builds on existing first-order logics with the addition of temporal predicated based on time intervals, aggregate functions, and tuple-set comparison operators. It adapts and extends the traditional aggregate functions in three ways: a) allowing any arbitrary number free variables in function statement, b) adding groupings, and c) defining new aggregate function. These features allow PDFOL to model person-centric databases, enabling formal and efficient reason about their accuracy. This dissertation also explains how data analysts can use PDFOL statements to formalize and develop formal accuracy metrics specific to a person-centric database, especially if it is an integrated person- centric database, which in turn can then be used to assess the accuracy of a database. Data analysts apply these metrics to person-centric data to compute the quality-assessment measurements, YD. After that, they use statistical methods to compare these measurements with the real-world measurements, YR. Compare YD and YR with the hypothesis that they should be very similar, if the person-centric data is an accurate and complete representations of the real-world population. Finally, I show that estimated accuracy using metrics based on PDFOL can be good predictors of database accuracy. Specifically, I evaluated the performance of selected accuracy metrics by applying them to a person-centric database, mutating the database in various ways to degrade its accuracy, and the re-apply the metrics to see if they reflect the expected degradation. This research will help data analyst to develop an accuracy metrics specific to their person-centric data. In addition, PDFOL can provide a foundation for future methods for reasoning about other quality dimensions of PII
    • …
    corecore