216,460 research outputs found

    TopExNet: Entity-Centric Network Topic Exploration in News Streams

    Full text link
    The recent introduction of entity-centric implicit network representations of unstructured text offers novel ways for exploring entity relations in document collections and streams efficiently and interactively. Here, we present TopExNet as a tool for exploring entity-centric network topics in streams of news articles. The application is available as a web service at https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 201

    Processing count queries over event streams at multiple time granularities

    Get PDF
    Management and analysis of streaming data has become crucial with its applications in web, sensor data, network tra c data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over di erent time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method e ciently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on di erent real data sets and the results obtained are presented to show its e ectiveness

    On Graph Stream Clustering with Side Information

    Full text link
    Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph streams is significantly challenging because of the complex structures of graphs and computational difficulties of continuous data. Meanwhile, a large volume of side information is associated with graphs, which can be of various types. The examples include the properties of users in social network activities, the meta attributes associated with web click graph streams and the location information in mobile communication networks. Such attributes contain extremely useful information and has the potential to improve the clustering process, but are neglected by most recent graph stream mining techniques. In this paper, we define a unified distance measure on both link structures and side attributes for clustering. In addition, we propose a novel optimization framework DMO, which can dynamically optimize the distance metric and make it adapt to the newly received stream data. We further introduce a carefully designed statistics SGS(C) which consume constant storage spaces with the progression of streams. We demonstrate that the statistics maintained are sufficient for the clustering process as well as the distance optimization and can be scalable to massive graphs with side attributes. We will present experiment results to show the advantages of the approach in graph stream clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape

    Expressive Stream Reasoning with Laser

    Full text link
    An increasing number of use cases require a timely extraction of non-trivial knowledge from semantically annotated data streams, especially on the Web and for the Internet of Things (IoT). Often, this extraction requires expressive reasoning, which is challenging to compute on large streams. We propose Laser, a new reasoner that supports a pragmatic, non-trivial fragment of the logic LARS which extends Answer Set Programming (ASP) for streams. At its core, Laser implements a novel evaluation procedure which annotates formulae to avoid the re-computation of duplicates at multiple time points. This procedure, combined with a judicious implementation of the LARS operators, is responsible for significantly better runtimes than the ones of other state-of-the-art systems like C-SPARQL and CQELS, or an implementation of LARS which runs on the ASP solver Clingo. This enables the application of expressive logic-based reasoning to large streams and opens the door to a wider range of stream reasoning use cases.Comment: 19 pages, 5 figures. Extended version of accepted paper at ISWC 201

    Supervised anomaly detection in uncertain pseudoperiodic data streams

    Get PDF
    Uncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports anomaly detection in uncertain data streams. The proposed framework adopts an efficient uncertainty pre-processing procedure to identify and eliminate uncertainties in data streams. Based on the corrected data streams, we develop effective period pattern recognition and feature extraction techniques to improve the computational efficiency. We use classification methods for anomaly detection in the corrected data stream. We also empirically show that the proposed approach shows a high accuracy of anomaly detection on a number of real datasets

    Ztreamy: a middleware for publishing semantic streams on the web

    Get PDF
    In order to make the semantic sensor Web a reality, middleware for efficiently publishing semantically-annotated data streams on the Web is needed. Such middleware should be designed to allow third parties to reuse and mash-up data coming from streams. These third parties should even be able to publish their own value-added streams derived from other streams and static data. In this work we present Ztreamy, a scalable middleware platform for the distribution of semantic data streams through HTTP. The platform provides an API for both publishing and consuming streams, as well as built-in filtering services based on data semantics. A key contribution of our proposal with respect to other related systems in the state of the art is its scalability. Our experiments with Ztreamy show that a single server is able, in some configurations, to publish a real-time stream to up to 40.000 simultaneous clients with delivery delays of just a few seconds, largely outperforming other systems in the state of the art.Publicad

    Demonstration of a stream reasoning platform on low-end devices to enable personalized real-time cycling feedback

    Get PDF
    During amateur cycling training, analyzing sensor data in real-time would allow riders to receive immediate feedback on how they are performing, and adapt their training accordingly. In this paper, a solution with Semantic Web technologies is presented that gives such real-time personalized feedback, by integrating the data streams with domain knowledge, rider profiles {\&} other context data. This solution consists of a stream reasoning engine running on a low-end Raspberry Pi device, and a tablet app showing feedback based on the continuous query results. To demonstrate this in a static environment, a virtual training app is presented, allowing a user to simulate an amateur cycling training

    A Quasi-Bayesian Perspective to Online Clustering

    Get PDF
    When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure
    • …
    corecore