15 research outputs found

    Computing the entropy of user navigation in the web

    Get PDF
    Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one

    Scaling laws for learning high-dimensional Markov forest distributions

    Get PDF
    The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is structurally consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n, d, k) are given for the algorithm to be structurally consistent. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.United States. Air Force Office of Scientific Research (Grant FA9559-08-1- 1080)United States. Army Research Office. Multidisciplinary University Research Initiative (Grant W911NF-06-1-0076)United States. Army Research Office. Multidisciplinary University Research Initiative (Grant FA9550-06-1-0324)Singapore. Agency for Science, Technology and Researc

    Neighborhood radius estimation in Variable-neighborhood Random Fields

    Full text link
    We consider random fields defined by finite-region conditional probabilities depending on a neighborhood of the region which changes with the boundary conditions. To predict the symbols within any finite region it is necessary to inspect a random number of neighborhood symbols which might change according to the value of them. In analogy to the one dimensional setting we call these neighborhood symbols the context of the region. This framework is a natural extension, to d-dimensional fields, of the notion of variable-length Markov chains introduced by Rissanen (1983) in his classical paper. We define an algorithm to estimate the radius of the smallest ball containing the context based on a realization of the field. We prove the consistency of this estimator. Our proofs are constructive and yield explicit upper bounds for the probability of wrong estimation of the radius of the context

    Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

    Get PDF
    The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.Comment: Accepted to the Journal of Machine Learning Research (Feb 2011
    corecore