11 research outputs found

    Top-k Query Processing in Probabilistic Databases with Non-materialized Views

    No full text

    FoXtrot: Distributed structural and value XML filtering

    No full text
    Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network. © 2012 ACM

    RDFS reasoning and query answering on top of DHTs

    No full text
    We study the problem of distributed RDFS reasoning and query answering on top of distributed hash tables. Scalable, distributed RDFS reasoning is an essential functionality for providing the scalability and performance that large-scale Semantic Web applications require. Our goal in this paper is to compare and evaluate two well-known approaches to RDFS reasoning, namely backward and forward chaining, on top of distributed hash tables. We show how to implement both algorithms on top of the distributed hash table Bamboo and prove their correctness. We also study the time-space trade-off exhibited by the algorithms analytically, and experimentally by evaluating our algorithms on PlanetLab. © 2008 Springer Berlin Heidelberg

    Querying and Learning in Probabilistic Databases

    No full text
    Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka. "data provenance") to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting

    From "Selena Gomez" to "Marlon Brando": Understanding explorative entity search

    No full text
    Consider a user who submits a search query "Shakira" having a specific search goal in mind (such as her age) but at the same time willing to explore information for other entities related to her, such as comparable singers. In previous work, a system called Spark, was developed to provide such search experience. Given a query submitted to the Yahoo search engine, Spark provides related entity suggestions for the query, exploiting, among else, public knowledge bases from the Semantic Web. We refer to this search scenario as explorative entity search. The effectiveness and efficiency of the approach has been demonstrated in previous work. The way users interact with these related entity suggestions and whether this interaction can be predicted have however not been studied. In this paper, we perform a large-scale analysis into how users interact with the entity results returned by Spark. We characterize the users, queries and sessions that appear to promote an explorative behavior. Based on this analysis, we develop a set of query and user-based features that reflect the click behavior of users and explore their effectiveness in the context of a prediction task

    {TriAD}: A Distributed Shared-nothing {RDF} Engine Based on Asynchronous Message Passing

    No full text
    We investigate a new approach to the design of distributed, shared-nothing RDF engines. Our engine, coined “TriAD”, combines join-ahead pruning via a novel form of RDF graph summarization with a locality-based, horizontal partitioning of RDF triples into a grid-like, distributed index structure. The multi-threaded and distributed execution of joins in TriAD is facilitated by an asynchronous Mes-sage Passing protocol which allows us to run multiple join oper-ators along a query plan in a fully parallel, asynchronous fashion. We believe that our architecture provides a so far unique approach to join-ahead pruning in a distributed environment, as the more classical form of sideways information passing would not permit for executing distributed joins in an asynchronous way. Our experi-ments over the LUBM, BTC andWSDTS benchmarks demonstrate that TriAD consistently outperforms centralized RDF engines by up to two orders of magnitude, while gaining a factor of more than three compared to the currently fastest, distributed engines. To our knowledge, we are thus able to report the so far fastest query re-sponse times for the above benchmarks using a mid-range server and regular Ethernet setup

    Atlas: Storing, updating and querying RDF(S) data on top of DHTs

    No full text
    The RDF(S) data model has been proposed for encoding metadata about Web resources. As more and more Web resources are annotated using RDF(S), there is an urgent need for efficiently dealing with this large volume of data. In this paper, we present Atlas, a peer-to-peer system for storing, updating and querying RDF(S) data. The Atlas system has been built using the distributed hash table Bamboo. Atlas was developed in the context of project OntoGrid, where it was used as a distributed repository for RDF(S) metadata describing Grid services and resources. The development of Atlas continues in other projects that our group participates currently. This paper gives an overview of the most recent version of Atlas and discusses a representative application. © 2010 Elsevier B.V. All rights reserved

    Effectively delivering XML information in periodic broadcast environments

    No full text
    Existing data placement algorithms for wireless data broadcast generally make assumptions that the clients’ queries are already known and the distribution of access frequencies of their queries can be obtained a priori. Unfortunately, these assumptions are not realistic in most real life applications because new mobile clients may join in anytime and clients may be reluctant to disclose their queries (due to privacy concerns). In this paper, we study the data placement problem of periodic XML data broadcast in mobile wireless environments. This is an important issue, particularly when XML becomes prevalent in today’s ubiquitous Web and mobile computing devices. Taking advantage of the structured characteristics of XML data, we are able to generate effective broadcast programs based purely on XML data on the server without any knowledge of the clients’ access patterns. This not only makes our work distinguished from previous studies, but also enables it to have broader applicability. We discuss structural sharing in XML data which forms the basis of our novel data placement algorithm. The proposed placement algorithm is validated through a set of experiments and the results show that our algorithm can effectively place XML data on air and significantly improve the overall access efficiency.Yongrui Qin, Quan Z. Sheng, Muntazir Mehdi, Hua Wang, and Dong Xi
    corecore