11,156 research outputs found

    Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)

    Get PDF
    This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Combining quantifications for flexible query result ranking

    Get PDF
    Databases contain data and database systems governing such databases are often intended to allow a user to query these data. On one hand, these data may be subject to imperfections, on the other hand, users may employ imperfect query preference specifications to query such databases. All of these imperfections lead to each query answer being accompanied by a collection of quantifications indicating how well (part of) a group of data complies with (part of) the user's query. A fundamental question is how to present the user with the query answers complying best to his or her query preferences. The work presented in this paper first determines the difficulties to overcome in reaching such presentation. Mainly, a useful presentation needs the ranking of the query answers based on the aforementioned quantifications, but it seems advisable to not combine quantifications with different interpretations. Thus, the work presented in this paper continues to introduce and examine a novel technique to determine a query answer ranking. Finally, a few aspects of this technique, among which its computational efficiency, are discussed
    • …
    corecore