5,043 research outputs found
On the online classification of data streams using weak estimators
In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most importantly, the model operates with the understanding that the correct classes of previously-classified patterns become available at a later juncture subsequent to some time instances, thus requiring us to update the training set and the training model. The results obtained from rigorous empirical analysis on multinomial distributions, is remarkable. Indeed, it demonstrates the applicability of our method on synthetic datasets, and proves the advantages of the introduced scheme
On utilizing weak estimators to achieve the online classification of data streams
Author's accepted version (post-print).Available from 03/09/2021.acceptedVersio
Improved Algorithms for Time Decay Streams
In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions.
We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well
Detection of fast radio transients with multiple stations: a case study using the Very Long Baseline Array
Recent investigations reveal an important new class of transient radio
phenomena that occur on sub-millisecond timescales. Often transient surveys'
data volumes are too large to archive exhaustively. Instead, an on-line
automatic system must excise impulsive interference and detect candidate events
in real-time. This work presents a case study using data from multiple
geographically distributed stations to perform simultaneous interference
excision and transient detection. We present several algorithms that
incorporate dedispersed data from multiple sites, and report experiments with a
commensal real-time transient detection system on the Very Long Baseline Array
(VLBA). We test the system using observations of pulsar B0329+54. The
multiple-station algorithms enhanced sensitivity for detection of individual
pulses. These strategies could improve detection performance for a future
generation of geographically distributed arrays such as the Australian Square
Kilometre Array Pathfinder and the Square Kilometre Array.Comment: 12 pages, 14 figures. Accepted for Ap
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
- …