6,583 research outputs found

    Average Case Analysis of Leaf-Centric Binary Tree Sources

    Get PDF
    We study the average size of the minimal directed acyclic graph (DAG) with respect to so-called leaf-centric binary tree sources as studied by Zhang, Yang, and Kieffer. A leaf-centric binary tree source induces for every n2n \geq 2 a probability distribution on all binary trees with nn leaves. We generalize a result shown by Flajolet, Gourdon, Martinez and Devroye according to which the average size of the minimal DAG of a binary tree that is produced by the binary search tree model is Θ(n/logn)\Theta(n / \log n)

    QuateXelero : an accelerated exact network motif detection algorithm

    Get PDF
    Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network

    A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    Full text link
    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a "Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named "Relative Selectivity" that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10-100x speedups over selectivity agnostic approaches.Comment: in 18th International Conference on Extending Database Technology (EDBT) (2015

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Queryable Compression for Massively Streaming Social Networks

    Get PDF
    The social networks of today are a set of massive, dynamically changing graph structures. Each of these graphs contain a set of nodes (individuals) and a set of edges among the nodes (relationships). The choice of representation of a graph determines what information is easy to obtain from it. However, many social network graphs are so large that even their basic representations (e.g. adjacency lists) do not fit in main memory. Hence an ongoing field of study has focused on designing compressed representations of graphs that facilitate certain query functions.This work is based on representing dynamic social networks that we call streaming graphs where edges stream into our compressed representation. The crux of this work is the use of a novel data structure for streaming graphs that is based on an indexed array of compressed binary trees that builds the graph directly without using any temporary storage structures. We provide fast access methods for edge existence (does an edge exist between two nodes?), neighbor queries (list a node’s neighbors), and streaming operations (add/remove nodes/edges). We test our algorithms on public, anonymized, massive graphs such as Friendster, LiveJournal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time to execute queries. Our experimental results show that our current approach outperforms previous approaches in various key respects such as compression time, compression memory, compression ratio, and query execution times and hence the best to date overall