5,043 research outputs found

    On the online classification of data streams using weak estimators

    Get PDF
    In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most importantly, the model operates with the understanding that the correct classes of previously-classified patterns become available at a later juncture subsequent to some time instances, thus requiring us to update the training set and the training model. The results obtained from rigorous empirical analysis on multinomial distributions, is remarkable. Indeed, it demonstrates the applicability of our method on synthetic datasets, and proves the advantages of the introduced scheme

    On utilizing weak estimators to achieve the online classification of data streams

    Get PDF
    Author's accepted version (post-print).Available from 03/09/2021.acceptedVersio

    Improved Algorithms for Time Decay Streams

    Get PDF
    In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

    Detection of fast radio transients with multiple stations: a case study using the Very Long Baseline Array

    Full text link
    Recent investigations reveal an important new class of transient radio phenomena that occur on sub-millisecond timescales. Often transient surveys' data volumes are too large to archive exhaustively. Instead, an on-line automatic system must excise impulsive interference and detect candidate events in real-time. This work presents a case study using data from multiple geographically distributed stations to perform simultaneous interference excision and transient detection. We present several algorithms that incorporate dedispersed data from multiple sites, and report experiments with a commensal real-time transient detection system on the Very Long Baseline Array (VLBA). We test the system using observations of pulsar B0329+54. The multiple-station algorithms enhanced sensitivity for detection of individual pulses. These strategies could improve detection performance for a future generation of geographically distributed arrays such as the Australian Square Kilometre Array Pathfinder and the Square Kilometre Array.Comment: 12 pages, 14 figures. Accepted for Ap

    Network Sampling: From Static to Streaming Graphs

    Full text link
    Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Our experimental results indicate that our proposed family of sampling methods more accurately preserves the underlying properties of the graph for both static and streaming graphs. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms
    • …
    corecore