14 research outputs found

    CoBib: An Architecture for a Collaborative Database by

    No full text
    The goal of CoBib is to allow affinity groups to effectively collaborate to maximize the searching and browsing utility of an academic paper database. The CoBib system will facilitate the process of surveying literature in a specific field by using the community's annotations and referrals. I am developing a database architecture for CoBib that provides users within research communities the means to collaboratively index and annotate citations. This extensible architecture is a novel solution that is interoperable with existing data formats and systems and incorporates recommendations gathered from the community for the discovery of new citations. The construction of CoBib raises several important questions regarding data integrity and utility. One such question is how to determine whether two citations refer to the same paper. This problem is generally referred to as object coreference. I discuss several current approaches to the object coreference problem. I additionally describe issues related to the back-end and front-end design: how to choose an appropriate database for a large set of records as well as the best means for users to query that database. 1 1

    Highlighting Disputed Claims on the Web

    No full text
    We describe Dispute Finder, a browser extension that alerts a user when information they read online is disputed by a source that they might trust. Dispute Finder examines the text on the page that the user is browsing and highlights any phrases that resemble known disputed claims. If a user clicks on a highlighted phrase then Dispute Finder shows them a list of articles that support other points of view. Dispute Finder builds a database of known disputed claims by crawling web sites that already maintain lists of disputed claims, and by allowing users to enter claims that they believe are disputed. Dispute Finder identifies snippets that make known disputed claims by running a simple textual entailment algorithm inside the browser extension, referring to a cached local copy of the claim database. In this paper, we explain the design of Dispute Finder, and the trade-offs between the various design decisions that we explored. Figure 1: Dispute Finder highlights text snippets that make disputed claims

    Dynamic Filter: Adaptive Query Processing with the Crowd

    No full text
    Hybrid human-machine query processing systems, such as crowd-powered database systems, aim to broaden the scope of questions users can ask about their data by incorporating human computation to support queries that may be subjective and/or require visual or semantic interpretation. A common type of query involves filtering data by several criteria, some of which need human computation to be evaluated. For example, filtering a set of hotels for those that both (1) have great views from the rooms, and (2) have a fitness center. Criteria can differ in the amount of human effort required to decide if data satisfy them, due to criterion's subjectivity and difficulty. There is potential to reduce crowdsourcing costs by ordering the evaluation of each of the criteria such that criteria needing more human computation are not processed for data that have not satisfied the less costly criteria. Unfortunately, for queries specified on-the-fly, the information about subjectivity and difficulty is unknown a priori. To overcome this challenge, we present Dynamic Filter, an adaptive query processing algorithm that dynamically changes the order in which criteria are evaluated based on observations while the query is running. Using crowdsourced data from a popular crowdsourcing platform, we show that Dynamic Filter can effectively adapt the processing order and approach the performance of a "clairvoyant" algorithm

    Crowdsourced Enumeration Queries

    No full text
    Abstract β€” Hybrid human/computer database systems promise to greatly expand the usefulness of query processing by incorporating the crowd for data gathering and other tasks. Such systems raise many implementation questions. Perhaps the most fundamental question is that the closed world assumption underlying relational query semantics does not hold in such systems. As a consequence the meaning of even simple queries can be called into question. Furthermore, query progress monitoring becomes difficult due to non-uniformities in the arrival of crowdsourced data and peculiarities of how people work in crowdsourcing systems. To address these issues, we develop statistical tools that enable users and systems developers to reason about query completeness. These tools can also help drive query execution and crowdsourcing strategies. We evaluate our techniques using experiments on a popular crowdsourcing platform. I
    corecore