582 research outputs found

    Tree-Mining: Understanding Applications and Challenges

    Get PDF
    Tree-mining is an essential system of techniques and software technologies for multi-level and multi-angled operations in databases. Pertaining to the purview of this manuscript, several applications of various sub-techniques of tree mining have been explored. The current write-up is aimed at investigating the major applications and challenges of different types and techniques of tree mining, as there have been patchy and scanty investigations so far in this context. To accomplish these tasks, the author has reviewed some of the latest and most pertinent research articles of the last two decades to investigate the titled aspects of this technique

    EFFICIENT APPROACH FOR VIEW SELECTION FOR DATA WAREHOUSE USING TREE MINING AND EVOLUTIONARY COMPUTATION

    Get PDF
    Selection of a proper set of views to materialize plays an important role indatabase performance. There are many methods of view selection which uses different techniques and frameworks to select an efficient set of views for materialization. In this paper, we present a new efficient, scalable method for view selection under the given storage constraints using a tree mining approach and evolutionary optimization. Tree mining algorithm is designed to determine the exact frequency of (sub)queries in the historical SQL dataset. Query Cost model achieves the objective of maximizing the performance benefits from the final view set which is derived from the frequent view set given by tree mining algorithm. Performance benefit of a query is defined as a function of queryfrequency, query creation cost, and query maintenance cost. The experimental results shows that the proposed method is successful in recommending a solution which is fairly close to optimal solution

    Asynchronous Collective Tree Exploration by Tree-Mining

    Full text link
    We investigate the problem of collaborative tree exploration with complete communication introduced by [FGKP06], in which a group of kk agents is assigned to collectively go through all edges of an unknown tree in an efficient manner and then return to the origin. The agents have unrestricted communication and computation capabilities. The algorithm's runtime is typically compared to the cost of offline traversal, which is at least max{2n/k,2D}\max\{2n/k,2D\} where nn is the number of nodes and DD is the tree depth. Since its introduction, two types of guarantee have emerged on the topic: the first is of the form r(k)(n/k+D)r(k)(n/k+D), where r(k)r(k) is called the competitive ratio, and the other is of the form 2n/k+f(k,D)2n/k+f(k,D), where f(k,D)f(k,D) is called the competitive overhead. In this paper, we present the first algorithm with linear-in-DD competitive overhead, thereby reconciling both approaches. Specifically, our bound is in 2n/k+O(klog2kD)2n/k + O(k^{\log_2 k} D) and thus leads to a competitive ratio in O(k/exp(0.8lnk))O(k/\exp(0.8\sqrt{\ln k})). This is the first improvement over the O(k/lnk)O(k/\ln k)-competitive algorithm known since the introduction of the problem in 2004. Our algorithm is obtained for an asynchronous generalization of collective tree exploration (ACTE). It is an instance of a general class of locally-greedy exploration algorithms that we define. We show that the additive overhead analysis of locally-greedy algorithms can be seen through the lens of a 2-player game that we call the tree-mining game and that could be of independent interest

    Tree mining application to matching of hetereogeneous knowledge

    Get PDF
    Matching of heterogeneous knowledge sources is of increasing importance in areas such as scientific knowledge management, e-commerce, enterprise application integration, and many emerging Semantic Web applications. With the desire of knowledge sharing and reuse in these fields, it is common that the knowledge coming from different organizations from the same domain is to be matched. We propose a knowledge matching method based on our previously developed tree mining algorithms for extracting frequently occurring subtrees from a tree structured database such as XML. Using the method the common structure among the different representations can be automatically extracted. Our focus is on knowledge matching at the structural level and we use a set of example XML schema documents from the same domain to evaluate the method. We discuss some important issues that arise when applying tree mining algorithms for detection of common document structures. The experiments demonstrate the usefulness of the approach

    Graph-based task libraries for robots: generalization and autocompletion

    Get PDF
    In this paper, we consider an autonomous robot that persists over time performing tasks and the problem of providing one additional task to the robot's task library. We present an approach to generalize tasks, represented as parameterized graphs with sequences, conditionals, and looping constructs of sensing and actuation primitives. Our approach performs graph-structure task generalization, while maintaining task ex- ecutability and parameter value distributions. We present an algorithm that, given the initial steps of a new task, proposes an autocompletion based on a recognized past similar task. Our generalization and auto- completion contributions are eective on dierent real robots. We show concrete examples of the robot primitives and task graphs, as well as results, with Baxter. In experiments with multiple tasks, we show a sig- nicant reduction in the number of new task steps to be provided

    Mining Shared Decision Trees between Datasets

    Get PDF
    This thesis studies the problem of mining models, patterns andstructures (MPS) shared by two datasets (applications), a well understood dataset, denoted as WD, and a poorly understood one, denoted as PD. Combined with users\u27 familiarity with WD, the shared MPS can help users better understand PD, since they capture similarities between WD and PD. Moreover, the knowledge on such similarities can enable the users to focus attention on analyzing the unique behavior of PD. Technically, this thesis focuses on the shared decision tree mining problem. In order to provide a view on the similarities between WD and PD, this thesis proposes to mine a high quality shared decision tree satisfying the properties: the tree has (1) highly similar data distribution and (2) high classification accuracy in the datasets. This thesis proposes an algorithm, namely SDT-Miner, for mining such shared decision tree. This algorithm is significantly different from traditional decision tree mining, since it addresses the challenges caused by the presence of two datasets, by the data distribution similarity requirement and by the tree accuracy requirement. The effectiveness of the algorithm is verified by experiments
    corecore