Search CORE

787 research outputs found

Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series

Author: Hutter Marcus
Shao Wen
Sunehag Peter
Publication venue: Australian Computer Society
Publication date: 01/12/2012
Field of study

An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise

The Australian National University

A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

Author: Baron Dror
Krishnan Nikhil
Publication venue
Publication date: 21/03/2015
Field of study

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-

N

input sequence is partitioned into

B

blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of

B

, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the

B

blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is

O(N/B)

. Its redundancy is approximately

B\log(N/B)

bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most

\log(N/B)

. We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special issue on Signal Processing for Big Data (expected publication date June 2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note: substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a typ

arXiv.org e-Print Archive

Universal Estimation of Directed Information

Author: Jiao Jiantao
Kim Young-Han
Permuter Haim H.
Weissman Tsachy
Zhao Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2013
Field of study

Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is a Shannon--McMillan--Breiman type estimator, similar to those used by Verd\'u (2005) and Cai, Kulkarni, and Verd\'u (2006) for estimation of other information measures. We show the almost sure and

L_1

convergence properties of the estimator for any underlying universal probability assignment. The other three estimators map universal probability assignments to different functionals, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness. We establish the consistency of these estimators in almost sure and

L_1

senses, and derive near-optimal rates of convergence in the minimax sense under mild conditions. These estimators carry over directly to estimating other information measures of stationary ergodic finite-alphabet processes, such as entropy rate and mutual information rate, with near-optimal performance and provide alternatives to classical approaches in the existing literature. Guided by these theoretical results, the proposed estimators are implemented using the context-tree weighting algorithm as the universal probability assignment. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the utility of directed information estimation in detecting and measuring causal influence and delay.Comment: 23 pages, 10 figures, to appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

CiteSeerX

Data Discovery and Anomaly Detection Using Atypicality: Theory

Author: Clayton Yates (572584)
Jason White (146854)
Jennifer Myers (4241683)
Kaixian Yu (2836718)
Karin Vallega (4241680)
Qing-Xiang Sang (3461384)
Publication venue
Publication date: 10/09/2017
Field of study

A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by 'atypical' in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets.Comment: 40 page

arXiv.org e-Print Archive

FigShare

Large-alphabet sequence modelling - a comparative study

Author: Shao Wen
Publication venue
Publication date: 10/01/2019
Field of study

Most raw data is not binary, but over some often large and structured alphabet. Sometimes it is convenient to deal with binarised data sequence, but typically exploiting the original structure of the data significantly improves performance in many practical applications. In this thesis, we study Martin-Lof random sequences that are maximally incompressible and provide a topological view on the size of the set of random sequences. We also investigate the relationship between binary data compression techniques and modelling natural language text with the latter using raw unbinarised data sequence from a large alphabet. We perform an experimental comparative study for them, including an empirical comparison between Kneser-Ney (KN) variants with regular Context Tree Weighting algorithm (CTW) and phase CTW, and with large-alphabet CTW with different estimators. We also apply the idea of Hutter's adaptive sparse Dirichlet-multinomial coding to the KN method and provide a heuristic to make the discounting parameter adaptive. The KN with this adaptive discounting parameter outperforms the traditional KN method on the Large Calgary corpus

The Australian National University

Top Down Electroweak Dipole Operators

Author: Fuyuto Kaori
Ramsey-Musolf Michael
Publication venue: 'Elsevier BV'
Publication date: 26/06/2017
Field of study

We derive present constraints on, and prospective sensitivity to, the electric dipole moment (EDM) of the top quark (

d_t

) implied by searches for the EDMs of the electron and nucleons. Above the electroweak scale

v

, the

d_t

arises from two gauge invariant operators generated at a scale

\Lambda \gg v

that also mix with the light fermion EDMs under renormalization group evolution at two-loop order. Bounds on the EDMs of first generation fermion systems thus imply bounds on

|d_t|

. Working in the leading log-squared approximation, we find that the present upper bound on

|d_t|

is roughly

10^{-19}

e

cm for

\Lambda = 1

TeV, except in regions of finely tuned cancellations that allow for

|d_t|

to be up to fifty times larger. Future

d_e

and

d_n

probes may yield an order of magnitude increase in

d_t

sensitivity, while inclusion of a prospective proton EDM search may lead to an additional increase in reach.Comment: 7 pages, 6 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Context tree switching

Author: Bowling Michael
Hutter Marcus
Ng Kee Siong
Veness Joel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources. By modifying Context Tree Weighting’s recursive weighting scheme, it is possible to mix over a strictly larger class of models without increasing the asymptotic time or space complexity of the original algorithm. We prove that this generalization preserves the desirable theoretical properties of Context Tree Weighting on stationary n-Markov sources, and show empirically that this new technique leads to consistent improvements over Context Tree Weighting as measured on the Calgary Corpus

The Australian National University