4 research outputs found

    Top-k similarity join over multi-valued objects

    Full text link
    The join query is a fundamental tool in many modern application areas including location-based services, geographic information system (GIS), finance and capital markets analysis, etc. Given two sets of objects U and V, a top-k similarity join returns k pairs of most similar objects from U x V. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining.In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications such as decision making and e-business, an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this thesis, we study top-k similarity join queries over multi-valued objects.We formalize the problem of top-k similarity join over multi-valued objects, regarding quantile-based distance metrics which is applied to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques

    Top-k Similarity Join over Multi-valued Objects

    No full text
    Abstract. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U and V, a top-k similarity join returns k pairs of most similar objects from U ×V. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join queries over multi-valued objects. We apply quantile based distance to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.

    Efficient top-k similarity join processing over multi-valued objects

    Full text link
    © 2013, Springer Science+Business Media New York. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U\mathcal U and V\mathcal V, a top-k similarity join returns k pairs of most similar objects from U×V\mathcal U \times \mathcal V. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, ϕ-quantile distance and ϕ-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques

    Efficient processing of spatial queries over uncertain database

    Full text link
    Uncertainty is inherent in many important applications, and many important queries are re-investigated in the context of uncertain data models. Efficient algorithms are strongly demanded to analyze spatial uncertain data.This thesis studies four fundamental problems to analyze spatial uncertain data by proposing efficient query processing algorithms, including (1) find top k influential facilities, (2) identify top k dominating objects, (3) range search on uncertain trajectories, and (4) top k similarity join.Firstly, we study the problem of finding top k most influential facilities over uncertain objects. We propose a new ranking model to identify the top k most influential facilities, which captures influence of facilities on the uncertain objects. Effective and efficient algorithms are proposed following the filtering-verification paradigm by utilizing two uncertain object indexing techniques. To effectively support uncertain objects with a large number of instances, we further develop randomized algorithms with accuracy guarantee.Secondly, we study the problem of top k dominating query on uncertain data, which is an essential method in the multi-criteria decision analysis when an explicit scoring function is not available. We formally introduce the top k dominating model, and propose effective and efficient algorithms to identify the top k dominating objects. Novel pruning techniques are proposed by utilizing the spatial indexing and statistic information to reduce CPU and I/O costs.Thirdly, we investigate the problem of range search on uncertain trajectories by assuming uncertain trajectories are modeled by the Markov Chains. We propose a general framework for range search on uncertain trajectories following the filtering-refinement paradigm where summaries of uncertain trajectories are constructed to facilitate the filtering process. Statistics based and partition based filtering techniques are developed to enhance the filtering capabilities.Finally, we investigate the problem of top k similarity join over multi-valued objects. We apply two types of quantile based distance measures to explore the relative instance distribution among the multiple instances of objects. Following a filtering-refinement framework, efficient and effective techniques to process top k similarity joins over multi-valued objects are developed. Novel distance, statistic and weight based pruning techniques are proposed to speed up the computations
    corecore