22 research outputs found

    Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL

    Get PDF
    Abstract. Social semantic data are becoming a reality, but apparently their streaming nature has been ignored so far. Streams, being unbounded sequences of time-varying data elements, should not be treated as persistent data to be stored “forever ” and queried on demand, but rather as transient data to be consumed on the fly by queries which are registered once and for all and keep analyzing such streams, producing answers triggered by the streaming data and not by explicit invocation. In this paper, we propose an approach to continuous queries and realtime analysis of social semantic data with C-SPARQL, an extension of SPARQL for querying RDF streams

    Stream Mining for Network Management

    Get PDF
    Network management is an important issue in maintaining the Internet as an important social infrastructure. Finding excessive consumption of network bandwidth caused by P2P mass flows is especially important. Finding Internet viruses is also an important security issue. Although stream mining techniques seem to be promising techniques to find P2P and Internet viruses, vast network flows prevent the simple application of such techniques. A mining technique which works well with extremely limited memory is required. Also it should have a real-time analysis capability. In this paper, we propose a cache based mining method to realize such a technique. By analyzing the characteristics of the proposed method with real Internet backbone flow data, we show the advantages of the proposed method, i.e. less memory consumption while realizing realtime analysis capability. We also show the fact that we can use the proposed method to find mass flow information from Internet backbone flow data

    Streaming Temporal Graphs: Subgraph Matching

    Full text link
    We investigate solutions to subgraph matching within a temporal stream of data. We present a high-level language for describing temporal subgraphs of interest, the Streaming Analytics Language (SAL). SAL programs are translated into C++ code that is run in parallel on a cluster. We call this implementation of SAL the Streaming Analytics Machine (SAM). SAL programs are succinct, requiring about 20 times fewer lines of code than using the SAM library directly, or writing an implementation using Apache Flink. To benchmark SAM we calculate finding temporal triangles within streaming netflow data. Also, we compare SAM to an implementation written for Flink. We find that SAM is able to scale to 128 nodes or 2560 cores, while Apache Flink has max throughput with 32 nodes and degrades thereafter. Apache Flink has an advantage when triangles are rare, with max aggregate throughput for Flink at 32 nodes greater than the max achievable rate of SAM. In our experiments, when triangle occurrence was faster than five per second per node, SAM performed better. Both frameworks may miss results due to latencies in network communication. SAM consistently reported an average of 93.7% of expected results while Flink decreases from 83.7% to 52.1% as we increase to the maximum size of the cluster. Overall, SAM can obtain rates of 91.8 billion netflows per day.Comment: Big Data 201

    An evaluation of streaming algorithms for distinct counting over a sliding window

    Get PDF
    Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed experimental study of the performance of different algorithms for distinct counting over a sliding window. We observe that the performance of an algorithm depends on the basic method used, as well as aspects such as the hash function, the mix of query and updates, and the method used to boost accuracy. We compare the performance of prominent algorithms and evaluate the influence of these factors, leading to practical recommendations for implementation. To the best of our knowledge, this is the first detailed experimental study of distinct counting over a sliding window
    corecore