3 research outputs found
How to select the largest k elements from evolving data?
In this paper we investigate the top--selection problem, i.e. determine
the largest, second largest, ..., and the -th largest elements, in the
dynamic data model. In this model the order of elements evolves dynamically
over time. In each time step the algorithm can only probe the changes of data
by comparing a pair of elements. Previously only two special cases were
studied[2]: finding the largest element and the median; and sorting all
elements. This paper systematically deals with and solves the
problem almost completely. Specifically, we identify a critical point
such that the top--selection problem can be solved error-free with
probability if and only if . A lower bound of the error when
is also determined, which actually is tight under some
condition. On the other hand, it is shown that the top--set problem, which
means finding the largest elements without sorting them, can be solved
error-free for all . Additionally, we extend the dynamic data model
and show that most of these results still hold.Comment: 23 pages, 2 figure
Behavior Query Discovery in System-Generated Temporal Graphs
Computer system monitoring generates huge amounts of logs that record the
interaction of system entities. How to query such data to better understand
system behaviors and identify potential system risks and malicious behaviors
becomes a challenging task for system administrators due to the dynamics and
heterogeneity of the data. System monitoring data are essentially heterogeneous
temporal graphs with nodes being system entities and edges being their
interactions over time. Given the complexity of such graphs, it becomes
time-consuming for system administrators to manually formulate useful queries
in order to examine abnormal activities, attacks, and vulnerabilities in
computer systems.
In this work, we investigate how to query temporal graphs and treat query
formulation as a discriminative temporal graph pattern mining problem. We
introduce TGMiner to mine discriminative patterns from system logs, and these
patterns can be taken as templates for building more complex queries. TGMiner
leverages temporal information in graphs to prune graph patterns that share
similar growth trend without compromising pattern quality. Experimental results
on real system data show that TGMiner is 6-32 times faster than baseline
methods. The discovered patterns were verified by system experts; they achieved
high precision (97%) and recall (91%).Comment: The full version of the paper "Behavior Query Discovery in
System-Generated Temporal Graphs", to appear in VLDB'1
Scalable and Robust Management of Dynamic Graph Data β
Most real-world networks evolve over time. This evolution can be modeled as a series of graphs that represent a network at different points in time. Our G * system enables efficient storage and querying of these graph snapshots by taking advantage of the commonalities among them. We are extending G * for highly scalable and robust operation. This paper shows that the classic challenges of data distribution and replication are imbued with renewed significance given continuously generated graph snapshots. Our data distribution technique adjusts the set of worker servers for storing each graph snapshot in a manner optimized for popular queries. Our data replication approach maintains each snapshot replica on a different number of workers, making available the most efficient replica configurations for different types of queries. 1