7,852 research outputs found

    Large Alphabets and Incompressibility

    Full text link
    We briefly survey some concepts related to empirical entropy -- normal numbers, de Bruijn sequences and Markov processes -- and investigate how well it approximates Kolmogorov complexity. Our results suggest â„“\ellth-order empirical entropy stops being a reasonable complexity metric for almost all strings of length mm over alphabets of size nn about when nâ„“n^\ell surpasses mm

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Quickest Sequence Phase Detection

    Full text link
    A phase detection sequence is a length-nn cyclic sequence, such that the location of any length-kk contiguous subsequence can be determined from a noisy observation of that subsequence. In this paper, we derive bounds on the minimal possible kk in the limit of n→∞n\to\infty, and describe some sequence constructions. We further consider multiple phase detection sequences, where the location of any length-kk contiguous subsequence of each sequence can be determined simultaneously from a noisy mixture of those subsequences. We study the optimal trade-offs between the lengths of the sequences, and describe some sequence constructions. We compare these phase detection problems to their natural channel coding counterparts, and show a strict separation between the fundamental limits in the multiple sequence case. Both adversarial and probabilistic noise models are addressed.Comment: To appear in the IEEE Transactions on Information Theor
    • …
    corecore