Search CORE

15 research outputs found

Computing the entropy of user navigation in the web

Author: Bush V.
Feller W.
George Loizou
Grimmett G. R.
Kemeny J. G.
Khinchin A. I.
Mark Levene
Nielsen J.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/09/2003
Field of study

Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one

Crossref

Birkbeck Institutional Research Online

Scaling laws for learning high-dimensional Markov forest distributions

Author: Anandkumar Animashree
Tan Vincent Yan Fu
Willsky Alan S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2010
Field of study

The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is structurally consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n, d, k) are given for the algorithm to be structurally consistent. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.United States. Air Force Office of Scientific Research (Grant FA9559-08-1- 1080)United States. Army Research Office. Multidisciplinary University Research Initiative (Grant W911NF-06-1-0076)United States. Army Research Office. Multidisciplinary University Research Initiative (Grant FA9550-06-1-0324)Singapore. Agency for Science, Technology and Researc

DSpace@MIT

Crossref

Neighborhood radius estimation in Variable-neighborhood Random Fields

Author: Besag
Comets
Csiszàr
Csiszàr
Dedecker
Dereudre
Dobrushin
Dobrushin
Dzhaparidze
Enza Orlandi
Eva Löcherbach
Fedotov
Ferrari
Finesso
Galves
Galves
Galves
Georgii
Gidas
Grunwald
Ji
Merhav
Presutti
Rissanen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We consider random fields defined by finite-region conditional probabilities depending on a neighborhood of the region which changes with the boundary conditions. To predict the symbols within any finite region it is necessary to inspect a random number of neighborhood symbols which might change according to the value of them. In analogy to the one dimensional setting we call these neighborhood symbols the context of the region. This framework is a natural extension, to d-dimensional fields, of the notion of variable-length Markov chains introduced by Rissanen (1983) in his classical paper. We define an algorithm to estimate the radius of the smallest ball containing the context based on a realization of the field. We prove the consistency of this estimator. Our proofs are constructive and yield explicit upper bounds for the probability of wrong estimation of the radius of the context

arXiv.org e-Print Archive

Crossref

Elsevier - Publisher Connector

Archivio della Ricerca - Università di Roma 3

Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

Author: Anandkumar Animashree
Tan Vincent Y. F.
Willsky Alan S.
Publication venue
Publication date: 01/01/2010
Field of study

The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.Comment: Accepted to the Journal of Machine Learning Research (Feb 2011

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Caltech Authors