4 research outputs found

    Probabilistic Query Models for Transaction Data

    No full text
    We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying problem: the query selectivity problem (i.e., finding exact counts for tuples in a database) and the query generalization problem (i.e., computing the probability that a tuple will occur in new data). We show that frequent itemsets are useful for reducing the original data to a compressed representation and introduce a way to store them using an ADTrees data structure. In an extension of our earlier work on this topic we propose several new schemes for query answering based on the compressed representation, that avoid direct scans of the data at query time. Experimental results on real-world transaction data sets provide insights into various tradeoffs involving offline time model-building, online time for query-answering, memory footprint of the compressed data, and the accuracy of..

    Probabilistic Query Models for Transaction Data

    No full text
    We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying problem: the query selectivity estimation (i.e., finding exact counts for tuples in a data set) and the query generalization problem (i.e., computing the probability that a tuple will occur in new data). We show that frequent itemsets are useful for reducing the original data to a compressed representation and introduce a method to store them using an ADTree data structure. In an extension of our earlier work on this topic we propose several new schemes for query answering based on the compressed representation that avoid direct scans of the data at query time. Experimental results on real-world transaction data sets provide insights into various tradeoffs involving the offline time for model-building, the online time for query-answering, the memory footprint of the compressed data, and the accuracy of the estimate provided to the query

    ABSTRACT Probabilistic Query Models for Transaction Data

    No full text
    pavlovd @ ics.uci.ed
    corecore