3 research outputs found

    How to select the largest k elements from evolving data?

    Full text link
    In this paper we investigate the top-kk-selection problem, i.e. determine the largest, second largest, ..., and the kk-th largest elements, in the dynamic data model. In this model the order of elements evolves dynamically over time. In each time step the algorithm can only probe the changes of data by comparing a pair of elements. Previously only two special cases were studied[2]: finding the largest element and the median; and sorting all elements. This paper systematically deals with k∈[n]k\in [n] and solves the problem almost completely. Specifically, we identify a critical point kβˆ—k^* such that the top-kk-selection problem can be solved error-free with probability 1βˆ’o(1)1-o(1) if and only if k=o(kβˆ—)k=o(k^*). A lower bound of the error when k=Ξ©(kβˆ—)k=\Omega(k^*) is also determined, which actually is tight under some condition. On the other hand, it is shown that the top-kk-set problem, which means finding the largest kk elements without sorting them, can be solved error-free for all k∈[n]k\in [n]. Additionally, we extend the dynamic data model and show that most of these results still hold.Comment: 23 pages, 2 figure

    Behavior Query Discovery in System-Generated Temporal Graphs

    Full text link
    Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs with nodes being system entities and edges being their interactions over time. Given the complexity of such graphs, it becomes time-consuming for system administrators to manually formulate useful queries in order to examine abnormal activities, attacks, and vulnerabilities in computer systems. In this work, we investigate how to query temporal graphs and treat query formulation as a discriminative temporal graph pattern mining problem. We introduce TGMiner to mine discriminative patterns from system logs, and these patterns can be taken as templates for building more complex queries. TGMiner leverages temporal information in graphs to prune graph patterns that share similar growth trend without compromising pattern quality. Experimental results on real system data show that TGMiner is 6-32 times faster than baseline methods. The discovered patterns were verified by system experts; they achieved high precision (97%) and recall (91%).Comment: The full version of the paper "Behavior Query Discovery in System-Generated Temporal Graphs", to appear in VLDB'1

    Scalable and Robust Management of Dynamic Graph Data βˆ—

    No full text
    Most real-world networks evolve over time. This evolution can be modeled as a series of graphs that represent a network at different points in time. Our G * system enables efficient storage and querying of these graph snapshots by taking advantage of the commonalities among them. We are extending G * for highly scalable and robust operation. This paper shows that the classic challenges of data distribution and replication are imbued with renewed significance given continuously generated graph snapshots. Our data distribution technique adjusts the set of worker servers for storing each graph snapshot in a manner optimized for popular queries. Our data replication approach maintains each snapshot replica on a different number of workers, making available the most efficient replica configurations for different types of queries. 1
    corecore