121,613 research outputs found

    "A Two-Stage Prediction Model for Web Page Transition"

    Get PDF
    Utilizing data from a log file, a two-stage model for step-ahead web page prediction that permits adaptive page customization in real-time is proposed. The first stage predicts the next page of a viewer based on a variant of a Markov transition matrix computed from page sequences of other visitors who read the same pages as that viewer did thus far. The second stage re-analyzes the incorrect exit/continuation predictions of the first stage through data mining, incorporating the visitor's viewing behavior observed from the log file. The two-stage process takes advantage of a robust, theory-driven nature of statistical modeling for extracting the overall feature of the data, and a flexible, data-driven nature of data mining to capture any idiosyncrasies and complications unresolved in the first stage. The empirical result with a test site implies that the first stage alone is sufficiently accurate (50.3%) in predicting page transitions. Prediction of site exit was even better with 100% of the exit and 90.8% of the continuation predictions being correct. The result was compared against other models for predictive accuracy.

    Using Markov Chains for link prediction in adaptive web sites

    Get PDF
    The large number of Web pages on many Web sites has raised navigational problems. Markov chains have recently been used to model user navigational behavior on the World Wide Web (WWW). In this paper, we propose a method for constructing a Markov model of a Web site based on past visitor behavior. We use the Markov model to make link predictions that assist new users to navigate the Web site. An algorithm for transition probability matrix compression has been used to cluster Web pages with similar transition behaviors and compress the transition matrix to an optimal size for efficient probability calculation in link prediction. A maximal forward path method is used to further improve the efficiency of link prediction. Link prediction has been implemented in an online system called ONE (Online Navigation Explorer) to assist users' navigation in the adaptive Web site

    Web Site Personalization based on Link Analysis and Navigational Patterns

    Get PDF
    The continuous growth in the size and use of the World Wide Web imposes new methods of design and development of on-line information services. The need for predicting the users’ needs in order to improve the usability and user retention of a web site is more than evident and can be addressed by personalizing it. Recommendation algorithms aim at proposing “next” pages to users based on their current visit and the past users’ navigational patterns. In the vast majority of related algorithms, however, only the usage data are used to produce recommendations, disregarding the structural properties of the web graph. Thus important – in terms of PageRank authority score – pages may be underrated. In this work we present UPR, a PageRank-style algorithm which combines usage data and link analysis techniques for assigning probabilities to the web pages based on their importance in the web site’s navigational graph. We propose the application of a localized version of UPR (l-UPR) to personalized navigational sub-graphs for online web page ranking and recommendation. Moreover, we propose a hybrid probabilistic predictive model based on Markov models and link analysis for assigning prior probabilities in a hybrid probabilistic model. We prove, through experimentation, that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches

    Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions

    Full text link
    Markov models have been widely used to represent and analyse user web navigation data. In previous work we have proposed a method to dynamically extend the order of a Markov chain model and a complimentary method for assessing the predictive power of such a variable length Markov chain. Herein, we review these two methods and propose a novel method for measuring the ability of a variable length Markov model to summarise user web navigation sessions up to a given length. While the summarisation ability of a model is important to enable the identification of user navigation patterns, the ability to make predictions is important in order to foresee the next link choice of a user after following a given trail so as, for example, to personalise a web site. We present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarisation ability

    Generating dynamic higher-order Markov models in web usage mining

    Get PDF
    Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter

    Retrospective Higher-Order Markov Processes for User Trails

    Full text link
    Users form information trails as they browse the web, checkin with a geolocation, rate items, or consume media. A common problem is to predict what a user might do next for the purposes of guidance, recommendation, or prefetching. First-order and higher-order Markov chains have been widely used methods to study such sequences of data. First-order Markov chains are easy to estimate, but lack accuracy when history matters. Higher-order Markov chains, in contrast, have too many parameters and suffer from overfitting the training data. Fitting these parameters with regularization and smoothing only offers mild improvements. In this paper we propose the retrospective higher-order Markov process (RHOMP) as a low-parameter model for such sequences. This model is a special case of a higher-order Markov chain where the transitions depend retrospectively on a single history state instead of an arbitrary combination of history states. There are two immediate computational advantages: the number of parameters is linear in the order of the Markov chain and the model can be fit to large state spaces. Furthermore, by providing a specific structure to the higher-order chain, RHOMPs improve the model accuracy by efficiently utilizing history states without risks of overfitting the data. We demonstrate how to estimate a RHOMP from data and we demonstrate the effectiveness of our method on various real application datasets spanning geolocation data, review sequences, and business locations. The RHOMP model uniformly outperforms higher-order Markov chains, Kneser-Ney regularization, and tensor factorizations in terms of prediction accuracy

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Full text link
    Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

    Social Dynamics of Digg

    Get PDF
    Online social media provide multiple ways to find interesting content. One important method is highlighting content recommended by user's friends. We examine this process on one such site, the news aggregator Digg. With a stochastic model of user behavior, we distinguish the effects of the content visibility and interestingness to users. We find a wide range of interest and distinguish stories primarily of interest to a users' friends from those of interest to the entire user community. We show how this model predicts a story's eventual popularity from users' early reactions to it, and estimate the prediction reliability. This modeling framework can help evaluate alternative design choices for displaying content on the site.Comment: arXiv admin note: text overlap with arXiv:1010.023
    • …
    corecore