Search CORE

3 research outputs found

How to select the largest k elements from evolving data?

Author: Huang Qin
Liu Xingwu
Sun Xiaoming
Zhang Jialin
Publication venue
Publication date: 28/12/2014
Field of study

In this paper we investigate the top-

k

-selection problem, i.e. determine the largest, second largest, ..., and the

k

-th largest elements, in the dynamic data model. In this model the order of elements evolves dynamically over time. In each time step the algorithm can only probe the changes of data by comparing a pair of elements. Previously only two special cases were studied[2]: finding the largest element and the median; and sorting all elements. This paper systematically deals with

k\in [n]

and solves the problem almost completely. Specifically, we identify a critical point

k^*

such that the top-

k

-selection problem can be solved error-free with probability

1-o(1)

if and only if

k=o(k^*)

. A lower bound of the error when

k=\Omega(k^*)

is also determined, which actually is tight under some condition. On the other hand, it is shown that the top-

k

-set problem, which means finding the largest

k

elements without sorting them, can be solved error-free for all

k\in [n]

. Additionally, we extend the dynamic data model and show that most of these results still hold.Comment: 23 pages, 2 figure

arXiv.org e-Print Archive

Behavior Query Discovery in System-Generated Temporal Graphs

Author: Jiang Guofei
Li Zhichun
Qian Zhiyun
Singh Ambuj K.
Wu Zhenyu
Xiao Xusheng
Yan Xifeng
Zong Bo
Publication venue
Publication date: 19/11/2015
Field of study

Computer system monitoring generates huge amounts of logs that record the interaction of system entities. How to query such data to better understand system behaviors and identify potential system risks and malicious behaviors becomes a challenging task for system administrators due to the dynamics and heterogeneity of the data. System monitoring data are essentially heterogeneous temporal graphs with nodes being system entities and edges being their interactions over time. Given the complexity of such graphs, it becomes time-consuming for system administrators to manually formulate useful queries in order to examine abnormal activities, attacks, and vulnerabilities in computer systems. In this work, we investigate how to query temporal graphs and treat query formulation as a discriminative temporal graph pattern mining problem. We introduce TGMiner to mine discriminative patterns from system logs, and these patterns can be taken as templates for building more complex queries. TGMiner leverages temporal information in graphs to prune graph patterns that share similar growth trend without compromising pattern quality. Experimental results on real system data show that TGMiner is 6-32 times faster than baseline methods. The discovered patterns were verified by system experts; they achieved high precision (97%) and recall (91%).Comment: The full version of the paper "Behavior Query Discovery in System-Generated Temporal Graphs", to appear in VLDB'1

arXiv.org e-Print Archive

Scalable and Robust Management of Dynamic Graph Data ∗

Author: Alan G. Labouseur
Jeong-hyon Hwang
Paul W. Olsen
Publication venue
Publication date
Field of study

Most real-world networks evolve over time. This evolution can be modeled as a series of graphs that represent a network at different points in time. Our G * system enables efficient storage and querying of these graph snapshots by taking advantage of the commonalities among them. We are extending G * for highly scalable and robust operation. This paper shows that the classic challenges of data distribution and replication are imbued with renewed significance given continuously generated graph snapshots. Our data distribution technique adjusts the set of worker servers for storing each graph snapshot in a manner optimized for popular queries. Our data replication approach maintains each snapshot replica on a different number of workers, making available the most efficient replica configurations for different types of queries. 1

CiteSeerX