5,755 research outputs found

    A note on the gambling team method

    Full text link
    Gerber and Li in \cite{GeLi} formulated, using a Markov chain embedding, a system of equations that describes relations between generating functions of waiting time distributions for occurrences of patterns in a sequence of independent repeated experiments when initial outcomes of the process are known. We show how this system of equations can be obtained by using the classical gambling team technique . We also present a form of solution of the system and give an example showing how first results of trials influence the probabilities that a chosen pattern precedes remaining ones in a realization of the process.Comment: 9 page

    Computing the entropy of user navigation in the web

    Get PDF
    Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one

    A fine grained heuristic to capture web navigation patterns

    Get PDF
    In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability

    Universal finitary codes with exponential tails

    Full text link
    In 1977, Keane and Smorodinsky showed that there exists a finitary homomorphism from any finite-alphabet Bernoulli process to any other finite-alphabet Bernoulli process of strictly lower entropy. In 1996, Serafin proved the existence of a finitary homomorphism with finite expected coding length. In this paper, we construct such a homomorphism in which the coding length has exponential tails. Our construction is source-universal, in the sense that it does not use any information on the source distribution other than the alphabet size and a bound on the entropy gap between the source and target distributions. We also indicate how our methods can be extended to prove a source-specific version of the result for Markov chains.Comment: 33 page
    corecore