6,583 research outputs found
Average Case Analysis of Leaf-Centric Binary Tree Sources
We study the average size of the minimal directed acyclic graph (DAG) with
respect to so-called leaf-centric binary tree sources as studied by Zhang,
Yang, and Kieffer. A leaf-centric binary tree source induces for every a probability distribution on all binary trees with leaves. We
generalize a result shown by Flajolet, Gourdon, Martinez and Devroye according
to which the average size of the minimal DAG of a binary tree that is produced
by the binary search tree model is
QuateXelero : an accelerated exact network motif detection algorithm
Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Queryable Compression for Massively Streaming Social Networks
The social networks of today are a set of massive, dynamically changing graph structures. Each of these graphs contain a set of nodes (individuals) and a set of edges among the nodes (relationships). The choice of representation of a graph determines what information is easy to obtain from it. However, many social network graphs are so large that even their basic representations (e.g. adjacency lists) do not fit in main memory. Hence an ongoing field of study has focused on designing compressed representations of graphs that facilitate certain query functions.This work is based on representing dynamic social networks that we call streaming graphs where edges stream into our compressed representation. The crux of this work is the use of a novel data structure for streaming graphs that is based on an indexed array of compressed binary trees that builds the graph directly without using any temporary storage structures. We provide fast access methods for edge existence (does an edge exist between two nodes?), neighbor queries (list a node’s neighbors), and streaming operations (add/remove nodes/edges). We test our algorithms on public, anonymized, massive graphs such as Friendster, LiveJournal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time to execute queries. Our experimental results show that our current approach outperforms previous approaches in various key respects such as compression time, compression memory, compression ratio, and query execution times and hence the best to date overall
- …