48 research outputs found

    Association Rules for Web Data Mining in WHOWEDA

    Get PDF
    The authors discuss association rules which can be discovered from Web data. The association rules are discussed within the scope of our WHOWEDA (warehouse of Web data) project. WHOWEDA is supported by a Web data model and a set of algebraic operators. The Web data model allows a uniform and integrated view of Web data gathered using a user\u27\u27s query graph. A user\u27\u27s query graph describes the query by example (what the user perceives as the query) and the Web coupling query gathers instances of such a query graph from the Web and stores them in the form of subgraphs (called Web tuples) in a Web table. We discuss association rules within this domain. An association rule defines an association between the nodes and links attributes of Web tuples within a Web table. There are two different classes of association rules that can be developed from data in a Web table. There are two different classes of association rules that can be developed from data in a Web table. Node-to-node associations are those rules that relate the content (defined by metadata attributes) between two or more nodes within a Web tuple. Link associations are rules that show the connectivity of different URLs. Distinguishing the two types of associations provides a view of the structure of the Web data. The goal of performing Web association mining on Web data is to better organize searching patterns through hyperlinked document

    A Framework for Cooperative Deductive Database Systems

    No full text
    In this paper, we address the problems of design, management, and integration of deductive database systems in a loosely coupled architecture, which constitute a cooperative deductive database system. We next address one important aspect of the problem of designing a cooperative deductive database system, namely, allocation of rules across the deductive database systems. We identify communication cost as the primary consideration in allocation of rules. The problem of optimal allocation of rules has been shown NP-complete, which has prohibitive execution times for large knowledge bases.  We propose a naďve algorithm for rule allocation and study its performance experimentally. We also show that this naďve algorithm can be used for reallocation of rules after rulebase gets updated

    Self Maintenance of Multiple Views in Data Warehousing

    No full text
    Materialized views MV at the data warehouse DW can be kept up to date in response to changes in data sources without accessing data sources for additional information. This process is usually referred to as "self maintenance of views". A number of algorithms have been proposed for self maintenance of views where they keep some additional information in DW in the form of auxiliary views (AV). In this paper we propose an algorithm for self maintainability of multiple MVs using the above approach. Our algorithm generates a simple maintenance query to incrementally maintain an MV along with its AV at DW. The algorithm maintains these views by minimizing the number and the size of the AVs. Our approach provides better insight into view maintenance issues by exploiting the dependencies and constraints that might exist in the data sources and multiple MVs at DW.

    Ratio threshold queries over distributed data sources

    No full text
    In this paper we consider triggers over distributed data from various sources such as: ¿Notify when sale of luxury goods constitute more than 20% of the overall sales¿. In such queries client desires to be notified whenever the ratio of two aggregates, over distributed data, crosses the specified threshold. The challenge lies in being able to execute the queries with the minimal amount of communication necessary for update propagation. We address the challenge by proposing schemes for converting the client threshold condition into conditions on individual distributed data sources such that (1) violation of the client threshold occurs only if one or more source conditions are violated (zero false negative), and (2) the number of source violations when client threshold is not violated is small (minimize false positives). Using performance evaluation we show that our algorithms result in up to an order of magnitude less number of false positives compared to the approaches in the literature
    corecore